Jump to content

Media Foundation: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
m Changing short description from "Application programming interface (API) for multimedia playback" to "Application programming interface for multimedia playback"
 
(41 intermediate revisions by 28 users not shown)
Line 1: Line 1:
{{Short description|Application programming interface for multimedia playback}}
'''Media Foundation''' (MF) is a [[Component Object Model|COM-based]] [[multimedia framework]] pipeline and infrastructure platform for digital media in [[Windows Vista]], [[Windows 7]] & [[Windows 8]]. It is the intended replacement for Microsoft [[DirectShow]], [[Windows Media|Windows Media SDK]], [[DirectX Media Objects|DirectX Media Objects (DMOs)]] and all other so-called "legacy" multimedia APIs such as [[Audio Compression Manager|Audio Compression Manager (ACM)]] and [[Video for Windows|Video for Windows (VfW)]]. The existing [[DirectShow]] technology is intended to be replaced by Media Foundation step-by-step, starting with a few features. For some time there will be a co-existence of Media Foundation and DirectShow. Media Foundation will not be available for previous Windows versions, including [[Windows XP]].
{{more citations needed|date = November 2012}}
{{other}}
'''Media Foundation''' ('''MF''') is a [[Component Object Model|COM-based]] [[multimedia framework]] pipeline and infrastructure platform for digital media in [[Windows Vista]], [[Windows 7]], [[Windows 8]], [[Windows 8.1]], [[Windows 10]], and [[Windows 11]]. It is the intended replacement for Microsoft [[DirectShow]], [[Windows Media|Windows Media SDK]], [[DirectX Media Objects|DirectX Media Objects (DMOs)]] and all other so-called "legacy" multimedia APIs such as [[Audio Compression Manager|Audio Compression Manager (ACM)]] and [[Video for Windows|Video for Windows (VfW)]]. The existing [[DirectShow]] technology is intended to be replaced by Media Foundation step-by-step, starting with a few features. For some time there will be a co-existence of Media Foundation and DirectShow. Media Foundation will not be available for previous Windows versions, including [[Windows XP]].


The first release, present in [[Windows Vista]], focuses on audio and video playback quality, [[High-definition video|high-definition]] content (i.e. [[HDTV]]), content protection and a more unified approach for digital data access control for [[digital rights management]] (DRM) and its interoperability. It integrates [[DirectX Video Acceleration|DXVA 2.0]] for offloading more of the video processing pipeline to hardware, for better performance. Videos are processed in the colorspace they were encoded in, and are handed off to the hardware, which composes the image in its native colorspace. This prevents intermediate colorspace conversions to improve performance. MF includes a new video renderer, called ''Enhanced Video Renderer'' (EVR), which is the next iteration of [[Video Mixing Renderer|VMR 7 and 9]]. EVR has better support for playback timing and synchronization. It uses the [[Multimedia Class Scheduler Service]] (MMCSS), a new [[Windows Service|service]] that prioritizes real time multimedia processing, to reserve the resources required for the playback, without any tearing or glitches.
The first release, present in [[Windows Vista]], focuses on audio and video playback quality, [[High-definition video|high-definition]] content (i.e. [[HDTV]]), content protection and a more unified approach for digital data access control for [[digital rights management]] (DRM) and its interoperability. It integrates [[DirectX Video Acceleration|DXVA 2.0]] for offloading more of the video processing pipeline to hardware, for better performance. Videos are processed in the colorspace they were encoded in, and are handed off to the hardware, which composes the image in its native colorspace. This prevents intermediate colorspace conversions to improve performance. MF includes a new video renderer, called ''Enhanced Video Renderer'' (EVR), which is the next iteration of [[Video Mixing Renderer|VMR 7 and 9]]. EVR has better support for playback timing and synchronization. It uses the [[Multimedia Class Scheduler Service]] (MMCSS), a new [[Windows Service|service]] that prioritizes real time multimedia processing, to reserve the resources required for the playback, without any tearing or glitches.


The second release included in [[Windows 7]] introduces expanded media format support and [[DirectX Video Acceleration|DXVA HD]] for acceleration of HD content if [[Windows Display Driver Model|WDDM]] 1.1 drivers are used.<ref>[http://msdn.microsoft.com/en-us/library/ee663586(VS.85).aspx DXVA-HD]</ref>
The second release included in [[Windows 7]] introduces expanded media format support and [[DirectX Video Acceleration|DXVA HD]] for acceleration of HD content if [[Windows Display Driver Model|WDDM]] 1.1 drivers are used.<ref>{{Cite web |url=http://msdn.microsoft.com/en-us/library/ee663586(VS.85).aspx |title=DXVA-HD |access-date=2010-04-18 |archive-date=2012-04-20 |archive-url=https://web.archive.org/web/20120420203318/http://msdn.microsoft.com/en-us/library/ee663586(VS.85).aspx |url-status=live }}</ref>


==Architecture==
==Architecture==
[[Image:MFoundation.svg|thumb|250px|right|Media Foundation Architecture]]
[[Image:MFoundation.svg|thumb|250px|right|Media Foundation Architecture]]
The MF architecture is divided into the ''Control layer'', ''Core Layer'' and the ''Platform layer''. The core layer encapsulates most of the functionality of Media Foundation. It consists of the media foundation pipeline, which has three components: ''Media Source'', ''Media Sink'' and ''Media Foundation Transforms'' (MFT). A media source is an object that acts as the source of multimedia data, either compressed or uncompressed. It can encapsulate various data sources, like a file, or a network server or even a camcorder, with source specific functionality [[abstraction (computer science)|abstracted]] by a common [[interface (object-oriented programming)|interface]]. A source object can use a ''source resolver'' object which creates a media source from an [[URI]], file or bytestream. Support for non-standard protocols can be added by creating a source resolver for them. A source object can also use a ''sequencer'' object to use a sequence of sources (a [[playlist]]) or to coalesce multiple sources into single logical source. A media sink is the recipient of processed multimedia data. A media sink can either be a ''renderer sink'', which renders the content on an output device, or an ''archive sink'', which saves the content onto a persistent storage system such as a file. A renderer sink takes uncompressed data as input whereas an archive sink can take either compressed or uncompressed data, depending on the output type. The data from media sources to sinks are acted upon by MFTs; MFTs are certain functions which transform the data into another form. MFTs can include multiplexers and demultiplexers, codecs or [[Digital Signal Processing|DSP]] effects like [[reverberation|reverb]]. The ''core layer'' uses services like file access and networking and clock synchronization to time the multimedia rendering. These are part of the ''Platform layer'', which provides services necessary for accessing the source and sink byte streams, presentation clocks and an object model that lets the core layer components function asynchronously, and is generally implemented as OS services. Pausing, stopping, fast forward, reverse or [[Time-compressed speech|time-compression]] can be achieved by controlling the presentation clock.
The MF architecture is divided into the ''Control layer'', ''Core Layer'' and the ''Platform layer''. The core layer encapsulates most of the functionality of Media Foundation. It consists of the media foundation pipeline, which has three components: ''Media Source'', ''Media Sink'' and ''Media Foundation Transforms'' (MFT). A media source is an object that acts as the source of multimedia data, either compressed or uncompressed. It can encapsulate various data sources, like a file, or a network server or even a camcorder, with source specific functionality [[abstraction (computer science)|abstracted]] by a common [[interface (object-oriented programming)|interface]]. A source object can use a ''source resolver'' object which creates a media source from an [[URI]], file or bytestream. Support for non-standard protocols can be added by creating a source resolver for them. A source object can also use a ''sequencer'' object to use a sequence of sources (a [[playlist]]) or to coalesce multiple sources into single logical source. A media sink is the recipient of processed multimedia data. A media sink can either be a ''renderer sink'', which renders the content on an output device, or an ''archive sink'', which saves the content onto a persistent storage system such as a file. A renderer sink takes uncompressed data as input whereas an archive sink can take either compressed or uncompressed data, depending on the output type. The data from media sources to sinks are acted upon by MFTs; MFTs are certain functions which transform the data into another form. MFTs can include multiplexers and demultiplexers, codecs or [[digital signal processing|DSP]] effects like [[reverberation|reverb]]. The ''core layer'' uses services like file access and networking and clock synchronization to time the multimedia rendering. These are part of the ''Platform layer'', which provides services necessary for accessing the source and sink byte streams, presentation clocks and an object model that lets the core layer components function asynchronously, and is generally implemented as OS services. Pausing, stopping, fast forward, reverse or [[Time-compressed speech|time-compression]] can be achieved by controlling the presentation clock.


However, the media pipeline components are not connected; rather they are just presented as discrete components. An application running in the ''Control layer'' has to choose which source types, transforms and sinks are needed for the particular video processing task at hand, and set up the "connections" between the components (a ''topology'') to complete the data flow pipeline. For example, to play back a compressed audio/video file, the pipeline will consist of a file source object, a demultiplexer for the specific file container format to split the audio and video streams, codecs to decompress the audio and video streams, DSP processors for audio and video effects and finally the EVR renderer, in sequence. Or for a video capture application, the camcorder will act as video and audio sources, on which codec MFTs will work to compress the data and feed to a multiplexer that coalesces the streams into a container; and finally a file sink or a network sink will write it to a file or [[Media streaming|stream]] over a network. The application also has to co-ordinate the flow of data between the pipeline components. The control layer has to "pull" (request) samples from one pipeline component and pass it onto the next component in order to achieve data flow within the pipeline. This is in contrast to [[DirectShow|DirectShow's]] "push" model where a pipeline component pushes data to the next component. Media Foundation allows content protection by hosting the pipeline within a protected execution environment, called the [[Protected Media Path]]. The control layer components are required to propagate the data through the pipeline at a rate that the rendering synchronizes with the presentation clock. The rate (or time) of rendering is embedded as a part of the multimedia stream as metadata. The source objects extract the metadata and pass it over. Metadata is of two types: ''coded metadata'', which is information about bit rate and presentation timings, and ''descriptive metadata'', like title and author names. Coded metadata is handed over to the object that controls the pipeline session, and descriptive metadata is exposed for the application to use if it chooses to.
However, the media pipeline components are not connected; rather they are just presented as discrete components. An application running in the ''Control layer'' has to choose which source types, transforms and sinks are needed for the particular video processing task at hand, and set up the "connections" between the components (a ''topology'') to complete the data flow pipeline. For example, to play back a compressed audio/video file, the pipeline will consist of a file source object, a demultiplexer for the specific file container format to split the audio and video streams, codecs to decompress the audio and video streams, DSP processors for audio and video effects and finally the EVR renderer, in sequence. Or for a video capture application, the camcorder will act as video and audio sources, on which codec MFTs will work to compress the data and feed to a multiplexer that coalesces the streams into a container; and finally a file sink or a network sink will write it to a file or [[Media streaming|stream]] over a network. The application also has to co-ordinate the flow of data between the pipeline components. The control layer has to "pull" (request) samples from one pipeline component and pass it onto the next component in order to achieve data flow within the pipeline. This is in contrast to [[DirectShow|DirectShow's]] "push" model where a pipeline component pushes data to the next component. Media Foundation allows content protection by hosting the pipeline within a protected execution environment, called the [[Protected Media Path]]. The control layer components are required to propagate the data through the pipeline at a rate that the rendering synchronizes with the presentation clock. The rate (or time) of rendering is embedded as a part of the multimedia stream as metadata. The source objects extract the metadata and pass it over. Metadata is of two types: ''coded metadata'', which is information about bit rate and presentation timings, and ''descriptive metadata'', like title and author names. Coded metadata is handed over to the object that controls the pipeline session, and descriptive metadata is exposed for the application to use if it chooses to.


Media Foundation provides a ''Media Session'' object that can be used to set up the topologies, and facilitate a data flow, without the application doing it explicitly. It exists in the control layer, and exposes a ''Topology loader'' object. The application specifies the required pipeline topology to the loader, which then creates the necessary connections between the components. The media session object manages the job of synchronizing with the presentation clock. It creates the presentation clock object, and passes a reference to it to the sink. It then uses the timer events from the clock to propagate data along the pipeline. It also changes the state of the clock to handle pause, stop or resume requests from the application.
Media Foundation provides a ''Media Session'' object that can be used to set up the topologies, and facilitate a data flow, without the application doing it explicitly. It exists in the control layer, and exposes a ''Topology loader'' object. The application specifies the required pipeline topology to the loader, which then creates the necessary connections between the components. The media session object manages the job of synchronizing with the presentation clock. It creates the presentation clock object, and passes a reference to it to the sink. It then uses the timer events from the clock to propagate data along the pipeline. It also changes the state of the clock to handle pause, stop or resume requests from the application.

===Practical MF Architectures===
Theoretically there is only one Media Foundation architecture and this is the Media Session, Pipeline, Media Source, Transform and Media Sink model. However this architecture can be complex to set up and there is considerable scope for lightweight, relatively easy to configure MF components designed to handle the processing of media data for simple point solutions. Thus practical considerations necessitated the implementation of variations on the fundamental Pipeline design and components such as the Source Reader and Sink Writer which operate outside the Pipeline model were developed. Some sources <ref>{{Cite web |url=https://github.com/OfItselfSo/Tanta |title=Example Source |website=[[GitHub]] |access-date=2019-01-19 |archive-date=2020-11-23 |archive-url=https://web.archive.org/web/20201123060823/https://github.com/OfItselfSo/Tanta |url-status=live }}</ref> split the Media Foundation architecture into three general classes.

* The Pipeline Architecture
* The Reader-Writer Architecture
* Hybrids between the Pipeline and Reader-Writer Architectures

The Pipeline Architecture is distinguished by the use of a distinct Media Session object and Pipeline. The media data flows from one or more Media Sources to one or more Media Sinks and, optionally, through zero or more Media Transforms. It is the Media Session that manages the flow of the media data through the Pipeline and that Pipeline can have multiple forks and branches. An MF application can get access to the media data as it traverses from a Media Source to a Media Sink by implementing a custom Media Transform component and inserting it in an appropriate location in the Pipeline.

The Reader-Writer Architecture uses a component called a Source Reader to provide the media data and a Sink Writer component to consume it. The Source Reader does contain a type of internal pipeline but this is not accessible to the application. A Source Reader is not a Media Source and a Sink Writer is not a Media Sink and neither can be directly included in a Pipeline or managed by a Media Session. In general, the media data flows from the Source Reader to the Sink Writer by the actions of the application. The application will either take the packets of media data (called Media Samples) from the Source Reader and give them directly them to the Sink Writer or it will set up a callback function on the Source Reader which performs the same operation. In effect, as it manages the data transport, the application itself performs a similar role to that of the Media Session in a Pipeline Architecture application. Since the MF application manages the transmission of the Media Samples between the Source Reader and Sink Writer it will always have access to the raw media data. The Source Reader and Sink Writer components do have a limited ability to automatically load Media Transforms to assist with the conversion of the format of the media data, however, this is done internally and the application has little control over it.

The Source Reader and Sink Writer provide ease of use and the Pipeline Architecture offers extremely sophisticated control over the flow of the media data. However, many of the components available to a Pipeline (such as the Enhanced Video Renderer) are simply not readily usable in a Reader-Writer architecture application. Since the structure of a Media Sample produced by a Source Reader is identical to that output by a Media Source it is possible to set up a Pipeline Architecture in which the Media Samples are intercepted as they pass through the Pipeline and a copy is given to a Media Sink. This is known as a Hybrid Architecture and it makes it possible to have an application which takes advantage of the sophisticated processing abilities of the Media Session and Pipeline while utilizing the ease of use of a Sink Writer. The Sink Writer is not part of the Pipeline and it does not interact with the Media Session. In effect, the media data is processed by a special Media Sink called a Sample Grabber Sink which consumes the media data and hands a copy off to the Sink Writer as it does so. It is also possible to implement a Hybrid Architecture with a custom Media Transform which copies the Media Samples and passes them to a Sink Writer as they pass through the Pipeline. In both cases a special component in the Pipeline effectively acts like a simple Reader-Writer application and feeds a Sink Writer. In general, Hybrid Architectures use a Pipeline and a Sink Writer. Theoretically, it is possible to implement a mechanism in which a Source Reader could somehow inject Media Samples into a Pipeline but, unlike the Sample Grabber Sink, no such standard component exists.


===Media Foundation Transform===<!-- This section is linked from [[Media Foundation Transform]] -->
===Media Foundation Transform===<!-- This section is linked from [[Media Foundation Transform]] -->
Line 23: Line 39:
* Video scalers
* Video scalers


Microsoft recommends developers to write a ''Media Foundation Transform'' instead of a DirectShow filter, for [[Windows Vista]], [[Windows 7]] & [[Windows 8]].<ref>[http://msdn2.microsoft.com/en-us/library/aa468614.aspx Migrating from DirectShow to Media Foundation and comparison of the two]</ref> For video editing and video capture, Microsoft recommends using DirectShow as they are not the primary focus of Media Foundation in Windows Vista. Starting with Windows 7, MFTs also support hardware-accelerated video processing, encoding and decoding for AVStream-based media devices.<ref>[http://msdn.microsoft.com/en-us/library/windows/hardware/gg299325(v=vs.85).aspx Getting Started with Hardware Codec Support in AVStream]</ref>
Microsoft recommends developers to write a ''Media Foundation Transform'' instead of a DirectShow filter, for [[Windows Vista]], [[Windows 7]] & [[Windows 8]].<ref>{{Cite web |url=http://msdn2.microsoft.com/en-us/library/aa468614.aspx |title=Migrating from DirectShow to Media Foundation and comparison of the two |access-date=2007-02-22 |archive-url=https://web.archive.org/web/20080409193345/http://msdn2.microsoft.com/en-us/library/aa468614.aspx |archive-date=2008-04-09 |url-status=dead }}</ref> For video editing and video capture, Microsoft recommends using DirectShow as they are not the primary focus of Media Foundation in Windows Vista. Starting with Windows 7, MFTs also support hardware-accelerated video processing, encoding and decoding for AVStream-based media devices.<ref>[http://msdn.microsoft.com/en-us/library/windows/hardware/gg299325(v=vs.85).aspx Getting Started with Hardware Codec Support in AVStream]</ref>


===Enhanced Video Renderer===
===Enhanced Video Renderer===
Line 31: Line 47:


===Supported media formats===
===Supported media formats===
[[Windows Media Audio]] and [[Windows Media Video]] are the only default supported formats for encoding through Media Foundation in [[Windows Vista]]. For decoding, an [[MPEG-1 Audio Layer 3|MP3]] file source is available in Windows Vista to read MP3 streams but an MP3 file sink to output MP3 is only available in Windows 7.<ref name="MFTCodecs">[http://msdn.microsoft.com/en-us/library/dd757927(VS.85).aspx Supported Media Formats in Media Foundation]</ref> Format support is extensible however; developers can add support for other formats by writing encoder/decoder MFTs and/or custom media sources/media sinks.
[[Windows Media Audio]] and [[Windows Media Video]] are the only default supported formats for encoding through Media Foundation in [[Windows Vista]]. For decoding, an [[MPEG-1 Audio Layer 3|MP3]] file source is available in Windows Vista to read MP3 streams but an MP3 file sink to output MP3 is only available in Windows 7.<ref name="MFTCodecs">{{Cite web |url=http://msdn.microsoft.com/en-us/library/dd757927(VS.85).aspx |title=Supported Media Formats in Media Foundation |access-date=2010-04-18 |archive-date=2010-04-29 |archive-url=https://web.archive.org/web/20100429001326/http://msdn.microsoft.com/en-us/library/dd757927(VS.85).aspx |url-status=live }}</ref> Format support is extensible however; developers can add support for other formats by writing encoder/decoder MFTs and/or custom media sources/media sinks.


Windows 7 expands upon the codec support available in Windows Vista. It includes [[Audio Video Interleave|AVI]], [[WAV]], [[Advanced Audio Coding#Container formats|AAC/ADTS]] file sources to read the respective formats,<ref name="MFTCodecs"/> an MPEG-4 file source to read [[MPEG-4 Part 14|MP4]], M4A, M4V, MP4V, [[.mov#QuickTime file format|MOV]] and [[3GP]] [[Container format (digital)|container formats]] <ref>[http://msdn.microsoft.com/en-us/library/dd757766(VS.85).aspx MPEG-4 File Source]</ref> and an MPEG-4 file sink to output to MP4 format.<ref>[http://msdn.microsoft.com/en-us/library/dd757763(VS.85).aspx MPEG-4 File Sink]</ref> Windows 7 also includes a media source to read [[MPEG transport stream]]/BDAV MPEG-2 transport stream (M2TS, MTS, M2T and [[AVCHD]]) files.{{Citation needed|date=April 2010}}
Windows 7 expands upon the codec support available in Windows Vista. It includes [[Audio Video Interleave|AVI]], [[WAV]], [[Advanced Audio Coding#Container formats|AAC/ADTS]] file sources to read the respective formats,<ref name="MFTCodecs"/> an MPEG-4 file source to read [[MPEG-4 Part 14|MP4]], M4A, M4V, MP4V, [[.mov#QuickTime file format|MOV]] and [[3GP]] [[Container format (digital)|container formats]]<ref>{{Cite web |url=http://msdn.microsoft.com/en-us/library/dd757766(VS.85).aspx |title=MPEG-4 File Source |access-date=2010-04-18 |archive-date=2010-03-14 |archive-url=https://web.archive.org/web/20100314122622/http://msdn.microsoft.com/en-us/library/dd757766(VS.85).aspx |url-status=live }}</ref> and an MPEG-4 file sink to output to MP4 format.<ref>{{Cite web |url=http://msdn.microsoft.com/en-us/library/dd757763(VS.85).aspx |title=MPEG-4 File Sink |access-date=2010-04-18 |archive-date=2010-08-04 |archive-url=https://web.archive.org/web/20100804063713/http://msdn.microsoft.com/en-us/library/dd757763(VS.85).aspx |url-status=live }}</ref>


Similar to Windows Vista, transcoding (encoding) support is not exposed through any built-in Windows application but several codecs are included as Media Foundation Transforms (MFTs).<ref name="MFTCodecs"/> In addition to [[Windows Media Audio]] and [[Windows Media Video]] encoders and decoders, and ASF file sink and file source introduced in Windows Vista,<ref name="MFTCodecs"/> Windows 7 includes an [[H.264]] encoder with Baseline profile level 3 and Main profile support <ref>[http://msdn.microsoft.com/en-us/library/dd797816(VS.85).aspx H.264 Video Encoder]</ref> and an [[Advanced Audio Coding|AAC]] Low Complexity ([[AAC-LC]]) profile encoder <ref>[http://msdn.microsoft.com/en-us/library/dd742785(VS.85).aspx AAC Encoder]</ref>
Similar to Windows Vista, transcoding (encoding) support is not exposed through any built-in Windows application but several codecs are included as Media Foundation Transforms (MFTs).<ref name="MFTCodecs"/> In addition to [[Windows Media Audio]] and [[Windows Media Video]] encoders and decoders, and ASF file sink and file source introduced in Windows Vista,<ref name="MFTCodecs"/> Windows 7 includes an [[H.264]] encoder with Baseline profile level 3 and Main profile support <ref>{{Cite web |url=http://msdn.microsoft.com/en-us/library/dd797816(VS.85).aspx |title=H.264 Video Encoder |access-date=2010-04-18 |archive-date=2010-03-04 |archive-url=https://web.archive.org/web/20100304095408/http://msdn.microsoft.com/en-us/library/dd797816(VS.85).aspx |url-status=live }}</ref> and an [[Advanced Audio Coding|AAC]] Low Complexity ([[AAC-LC]]) profile encoder <ref>{{Cite web |url=http://msdn.microsoft.com/en-us/library/dd742785(VS.85).aspx |title=AAC Encoder |access-date=2010-04-18 |archive-date=2009-10-13 |archive-url=https://web.archive.org/web/20091013084005/http://msdn.microsoft.com/en-us/library/dd742785(VS.85).aspx |url-status=live }}</ref>


For playback of various media formats, Windows 7 also introduces an H.264 decoder with Baseline, Main, and High profile support, up to level 5.1,<ref>[http://msdn.microsoft.com/en-us/library/dd797815(VS.85).aspx H.264 Video Decoder]</ref> [[AAC-LC]] and [[HE-AAC]] v1 ([[Spectral band replication|SBR]]) multichannel, HE-AAC v2 ([[Parametric Stereo|PS]]) stereo decoders,<ref>[http://msdn.microsoft.com/en-us/library/dd742784(VS.85).aspx AAC Decoder]</ref> [[MPEG-4 Part 2]] [[MPEG-4 Part 2#Simple Profile (SP)|Simple Profile]] and [[MPEG-4 ASP|Advanced Simple Profile]] decoders <ref>[http://msdn.microsoft.com/en-us/library/dd756559(VS.85).aspx MPEG4 Part 2 Video Decoder]</ref> which includes decoding popular codec implementations such as [[DivX]], [[Xvid]] and [[Nero Digital]] as well as [[MJPEG]] <ref name="MFTCodecs"/> and [[DV]] <ref>[http://msdn.microsoft.com/en-us/library/dd940322(VS.85).aspx DV Video Decoder]</ref> MFT decoders for AVI. [[Windows Media Player 12]] uses the built-in Media Foundation codecs to play these formats by default.
For playback of various media formats, Windows 7 also introduces an H.264 decoder with Baseline, Main, and High-profile support, up to level 5.1,<ref>{{Cite web |url=http://msdn.microsoft.com/en-us/library/dd797815(VS.85).aspx |title=H.264 Video Decoder |access-date=2010-04-18 |archive-date=2010-04-21 |archive-url=https://web.archive.org/web/20100421161950/http://msdn.microsoft.com/en-us/library/dd797815(VS.85).aspx |url-status=live }}</ref> [[AAC-LC]] and [[HE-AAC]] v1 ([[Spectral band replication|SBR]]) multichannel, HE-AAC v2 ([[Parametric Stereo|PS]]) stereo decoders,<ref>{{Cite web |url=http://msdn.microsoft.com/en-us/library/dd742784(VS.85).aspx |title=AAC Decoder |access-date=2010-04-18 |archive-date=2010-03-18 |archive-url=https://web.archive.org/web/20100318062142/http://msdn.microsoft.com/en-us/library/dd742784(VS.85).aspx |url-status=live }}</ref> [[MPEG-4 Part 2]] [[MPEG-4 Part 2#Simple Profile (SP)|Simple Profile]] and [[MPEG-4 ASP|Advanced Simple Profile]] decoders <ref>{{Cite web |url=http://msdn.microsoft.com/en-us/library/dd756559(VS.85).aspx |title=MPEG4 Part 2 Video Decoder |access-date=2010-04-18 |archive-date=2010-02-11 |archive-url=https://web.archive.org/web/20100211070005/http://msdn.microsoft.com/en-us/library/dd756559(VS.85).aspx |url-status=live }}</ref> which includes decoding popular codec implementations such as [[DivX]], [[Xvid]] and [[Nero Digital]] as well as [[MJPEG]]<ref name="MFTCodecs"/> and [[DV (video format)|DV]]<ref>{{Cite web |url=http://msdn.microsoft.com/en-us/library/dd940322(VS.85).aspx |title=DV Video Decoder |access-date=2010-04-18 |archive-date=2010-03-29 |archive-url=https://web.archive.org/web/20100329062927/http://msdn.microsoft.com/en-us/library/dd940322(VS.85).aspx |url-status=live }}</ref> MFT decoders for AVI. [[Windows Media Player 12]] uses the built-in Media Foundation codecs to play these formats by default.


[[Musical Instrument Digital Interface|MIDI]] playback is also not yet supported using Media Foundation.
[[Musical Instrument Digital Interface|MIDI]] playback is also not yet supported using Media Foundation.


==Application support==
{{Unreferenced section|date=July 2012}}
{{Unreferenced section|date = November 2013}}
==Benefits over DirectShow==
Media Foundation offers the following benefits:
* Is scalable for high-definition content and [[Digital Rights Management|DRM]]-protected content.
* Provides better resilience to CPU, I/O, and memory stress for low-latency glitch-free playback of audio and video. [[Page tearing|Video tearing]] has been minimized. The improved video processing support also enables high color spaces and enhanced full-screen playback. Enhanced video renderer (EVR) which is also available for DirectShow, offers better timing support and improved video processing.
* Media Foundation extensibility enables different content protection systems to operate together.
* Media Foundation uses the [[Multimedia Class Scheduler Service]] (MMCSS), a new system service in Windows Vista, Windows 7 & Windows 8. MMCSS enables multimedia applications to ensure that their time-sensitive processing receives prioritized access to CPU resources.


==DirectShow benefits over Media Foundation==
Applications that support Media Foundation include:
* [[Windows Media Player]] in Windows Vista and later
* Flexible and comprehensive facilities.
* [[Windows Media Center]] in Windows Vista and later
* Very well supported by third parties.
* [[Firefox]] v24 and later on Windows 7 and later (only for [[H.264/MPEG-4 AVC|H.264]] playback)
* Thoroughly tested.
* [[GoldWave]] 5.60 and later relies on Media Foundation for importing and exporting audio. For export, [[Advanced Audio Coding|AAC]] and [[Apple Lossless]] formats can be saved via Media Foundation

Any application that uses [[Protected Media Path]] in Windows also uses Media Foundation.
==Application support==
Media Foundation, for this initial release in [[Windows Vista]] and later release in [[Windows 7]] & [[Windows 8]], finds use in media playback applications. Until now, mainly internal or bundled Windows services and applications are using Media Foundation.
* [[Windows Media Player]] 11 in Windows Vista relies on Media Foundation for playing ASF (WMA and WMV) content and protected content, but can also use DirectShow or the Windows Media Format SDK instead. In the case of WMV9 playback, this also implies using DXVA 2.0 instead of DXVA 1.0 when the video hardware supports WMV9/VC-1 decoding acceleration.
* Windows Media Player 12 in [[Windows 7]].
* Windows Media Player in [[Windows 8]].
* Windows Media Center in Windows Vista, Windows 7, Windows 8 and later.
* Any application that uses Windows [[Protected Media Path]] (PMP), relies completely on Media Foundation.
* Latest versions of Firefox uses Media Foundation for H.264 playback.
<!-- * The updated Home Cinema fork of [[Media Player Classic]] supports EVR. -->
<!-- This have nothing to do with Media Foundation. EVR comes in 2 variants as directshow AND mediafoundation component. Only directshow supported. There is 0 mediafoundation support in MPC, MPC-HC, MPC-BE or any their forks -->
* [[GoldWave]] 5.60 and later relies on Media Foundation for importing and exporting audio.


==References==
==References==
Line 70: Line 71:


==External links==
==External links==
* [http://msdn2.microsoft.com/en-us/library/ms694197.aspx Microsoft Media Foundation SDK]
* [https://docs.microsoft.com/en-us/windows/win32/medfound/microsoft-media-foundation-sdk Microsoft Media Foundation]
* [http://social.msdn.microsoft.com/Forums/en/mediafoundationdevelopment Media Foundation Development Forum]
* [https://web.archive.org/web/20110728131607/http://social.msdn.microsoft.com/Forums/en/mediafoundationdevelopment Media Foundation Development Forum]
*[http://blogs.msdn.com/mf/default.aspx Media Foundation Team Blog (with samples)]
*[https://docs.microsoft.com/en-us/archive/blogs/mf Media Foundation Team Blog (with samples)]
*[http://msdn2.microsoft.com/en-us/library/aa368930.aspx Media Source Metadata]
*[http://msdn2.microsoft.com/en-us/library/aa368930.aspx Media Source Metadata]
*[http://msdn2.microsoft.com/en-us/library/ms703912.aspx Media Foundation Pipeline]
*[http://msdn2.microsoft.com/en-us/library/ms703912.aspx Media Foundation Pipeline]
*[http://msdn2.microsoft.com/en-us/library/ms696219.aspx Media Foundation Architecture]
*[http://msdn2.microsoft.com/en-us/library/ms696219.aspx Media Foundation Architecture]
*[http://msdn2.microsoft.com/en-us/library/ms694084.aspx About the Media Session]
*[http://msdn2.microsoft.com/en-us/library/ms694084.aspx About the Media Session]
*[http://msdn2.microsoft.com/en-us/library/ms696274.aspx About the Media Foundation SDK]
*[https://web.archive.org/web/20080406115323/http://msdn2.microsoft.com/en-us/library/ms694916.aspx Enhanced Video Renderer]
*[http://www.ofitselfso.com/Tanta/Windows_Media_Foundation_Getting_Started_CSharp.pdf Windows Media Foundation: Getting Started in C#]
*[http://msdn2.microsoft.com/en-us/library/ms694916.aspx Enhanced Video Renderer]
{{refend}}


{{Microsoft APIs}}
{{Microsoft APIs}}

Latest revision as of 12:56, 8 December 2024

Media Foundation (MF) is a COM-based multimedia framework pipeline and infrastructure platform for digital media in Windows Vista, Windows 7, Windows 8, Windows 8.1, Windows 10, and Windows 11. It is the intended replacement for Microsoft DirectShow, Windows Media SDK, DirectX Media Objects (DMOs) and all other so-called "legacy" multimedia APIs such as Audio Compression Manager (ACM) and Video for Windows (VfW). The existing DirectShow technology is intended to be replaced by Media Foundation step-by-step, starting with a few features. For some time there will be a co-existence of Media Foundation and DirectShow. Media Foundation will not be available for previous Windows versions, including Windows XP.

The first release, present in Windows Vista, focuses on audio and video playback quality, high-definition content (i.e. HDTV), content protection and a more unified approach for digital data access control for digital rights management (DRM) and its interoperability. It integrates DXVA 2.0 for offloading more of the video processing pipeline to hardware, for better performance. Videos are processed in the colorspace they were encoded in, and are handed off to the hardware, which composes the image in its native colorspace. This prevents intermediate colorspace conversions to improve performance. MF includes a new video renderer, called Enhanced Video Renderer (EVR), which is the next iteration of VMR 7 and 9. EVR has better support for playback timing and synchronization. It uses the Multimedia Class Scheduler Service (MMCSS), a new service that prioritizes real time multimedia processing, to reserve the resources required for the playback, without any tearing or glitches.

The second release included in Windows 7 introduces expanded media format support and DXVA HD for acceleration of HD content if WDDM 1.1 drivers are used.[1]

Architecture

[edit]
Media Foundation Architecture

The MF architecture is divided into the Control layer, Core Layer and the Platform layer. The core layer encapsulates most of the functionality of Media Foundation. It consists of the media foundation pipeline, which has three components: Media Source, Media Sink and Media Foundation Transforms (MFT). A media source is an object that acts as the source of multimedia data, either compressed or uncompressed. It can encapsulate various data sources, like a file, or a network server or even a camcorder, with source specific functionality abstracted by a common interface. A source object can use a source resolver object which creates a media source from an URI, file or bytestream. Support for non-standard protocols can be added by creating a source resolver for them. A source object can also use a sequencer object to use a sequence of sources (a playlist) or to coalesce multiple sources into single logical source. A media sink is the recipient of processed multimedia data. A media sink can either be a renderer sink, which renders the content on an output device, or an archive sink, which saves the content onto a persistent storage system such as a file. A renderer sink takes uncompressed data as input whereas an archive sink can take either compressed or uncompressed data, depending on the output type. The data from media sources to sinks are acted upon by MFTs; MFTs are certain functions which transform the data into another form. MFTs can include multiplexers and demultiplexers, codecs or DSP effects like reverb. The core layer uses services like file access and networking and clock synchronization to time the multimedia rendering. These are part of the Platform layer, which provides services necessary for accessing the source and sink byte streams, presentation clocks and an object model that lets the core layer components function asynchronously, and is generally implemented as OS services. Pausing, stopping, fast forward, reverse or time-compression can be achieved by controlling the presentation clock.

However, the media pipeline components are not connected; rather they are just presented as discrete components. An application running in the Control layer has to choose which source types, transforms and sinks are needed for the particular video processing task at hand, and set up the "connections" between the components (a topology) to complete the data flow pipeline. For example, to play back a compressed audio/video file, the pipeline will consist of a file source object, a demultiplexer for the specific file container format to split the audio and video streams, codecs to decompress the audio and video streams, DSP processors for audio and video effects and finally the EVR renderer, in sequence. Or for a video capture application, the camcorder will act as video and audio sources, on which codec MFTs will work to compress the data and feed to a multiplexer that coalesces the streams into a container; and finally a file sink or a network sink will write it to a file or stream over a network. The application also has to co-ordinate the flow of data between the pipeline components. The control layer has to "pull" (request) samples from one pipeline component and pass it onto the next component in order to achieve data flow within the pipeline. This is in contrast to DirectShow's "push" model where a pipeline component pushes data to the next component. Media Foundation allows content protection by hosting the pipeline within a protected execution environment, called the Protected Media Path. The control layer components are required to propagate the data through the pipeline at a rate that the rendering synchronizes with the presentation clock. The rate (or time) of rendering is embedded as a part of the multimedia stream as metadata. The source objects extract the metadata and pass it over. Metadata is of two types: coded metadata, which is information about bit rate and presentation timings, and descriptive metadata, like title and author names. Coded metadata is handed over to the object that controls the pipeline session, and descriptive metadata is exposed for the application to use if it chooses to.

Media Foundation provides a Media Session object that can be used to set up the topologies, and facilitate a data flow, without the application doing it explicitly. It exists in the control layer, and exposes a Topology loader object. The application specifies the required pipeline topology to the loader, which then creates the necessary connections between the components. The media session object manages the job of synchronizing with the presentation clock. It creates the presentation clock object, and passes a reference to it to the sink. It then uses the timer events from the clock to propagate data along the pipeline. It also changes the state of the clock to handle pause, stop or resume requests from the application.

Practical MF Architectures

[edit]

Theoretically there is only one Media Foundation architecture and this is the Media Session, Pipeline, Media Source, Transform and Media Sink model. However this architecture can be complex to set up and there is considerable scope for lightweight, relatively easy to configure MF components designed to handle the processing of media data for simple point solutions. Thus practical considerations necessitated the implementation of variations on the fundamental Pipeline design and components such as the Source Reader and Sink Writer which operate outside the Pipeline model were developed. Some sources [2] split the Media Foundation architecture into three general classes.

  • The Pipeline Architecture
  • The Reader-Writer Architecture
  • Hybrids between the Pipeline and Reader-Writer Architectures

The Pipeline Architecture is distinguished by the use of a distinct Media Session object and Pipeline. The media data flows from one or more Media Sources to one or more Media Sinks and, optionally, through zero or more Media Transforms. It is the Media Session that manages the flow of the media data through the Pipeline and that Pipeline can have multiple forks and branches. An MF application can get access to the media data as it traverses from a Media Source to a Media Sink by implementing a custom Media Transform component and inserting it in an appropriate location in the Pipeline.

The Reader-Writer Architecture uses a component called a Source Reader to provide the media data and a Sink Writer component to consume it. The Source Reader does contain a type of internal pipeline but this is not accessible to the application. A Source Reader is not a Media Source and a Sink Writer is not a Media Sink and neither can be directly included in a Pipeline or managed by a Media Session. In general, the media data flows from the Source Reader to the Sink Writer by the actions of the application. The application will either take the packets of media data (called Media Samples) from the Source Reader and give them directly them to the Sink Writer or it will set up a callback function on the Source Reader which performs the same operation. In effect, as it manages the data transport, the application itself performs a similar role to that of the Media Session in a Pipeline Architecture application. Since the MF application manages the transmission of the Media Samples between the Source Reader and Sink Writer it will always have access to the raw media data. The Source Reader and Sink Writer components do have a limited ability to automatically load Media Transforms to assist with the conversion of the format of the media data, however, this is done internally and the application has little control over it.

The Source Reader and Sink Writer provide ease of use and the Pipeline Architecture offers extremely sophisticated control over the flow of the media data. However, many of the components available to a Pipeline (such as the Enhanced Video Renderer) are simply not readily usable in a Reader-Writer architecture application. Since the structure of a Media Sample produced by a Source Reader is identical to that output by a Media Source it is possible to set up a Pipeline Architecture in which the Media Samples are intercepted as they pass through the Pipeline and a copy is given to a Media Sink. This is known as a Hybrid Architecture and it makes it possible to have an application which takes advantage of the sophisticated processing abilities of the Media Session and Pipeline while utilizing the ease of use of a Sink Writer. The Sink Writer is not part of the Pipeline and it does not interact with the Media Session. In effect, the media data is processed by a special Media Sink called a Sample Grabber Sink which consumes the media data and hands a copy off to the Sink Writer as it does so. It is also possible to implement a Hybrid Architecture with a custom Media Transform which copies the Media Samples and passes them to a Sink Writer as they pass through the Pipeline. In both cases a special component in the Pipeline effectively acts like a simple Reader-Writer application and feeds a Sink Writer. In general, Hybrid Architectures use a Pipeline and a Sink Writer. Theoretically, it is possible to implement a mechanism in which a Source Reader could somehow inject Media Samples into a Pipeline but, unlike the Sample Grabber Sink, no such standard component exists.

Media Foundation Transform

[edit]

Media Foundation Transforms (MFTs) represent a generic model for processing media data. They are used in Media Foundation primarily to implement decoders, encoders, mixers and digital signal processors (DSPs) – between media sources and media sinks. Media Foundation Transforms are an evolution of the transform model first introduced with DirectX Media Objects (DMOs). Their behaviors are more clearly specified. Hybrid DMO/MFT Objects can also be created. Applications can use MFTs inside the Media Foundation pipeline, or use them directly as stand-alone objects. MFTs can be any of the following type:

  • Audio and video codecs
  • Audio and video effects
  • Multiplexers and demultiplexers
  • Tees
  • Color-space converters
  • Sample-rate converters
  • Video scalers

Microsoft recommends developers to write a Media Foundation Transform instead of a DirectShow filter, for Windows Vista, Windows 7 & Windows 8.[3] For video editing and video capture, Microsoft recommends using DirectShow as they are not the primary focus of Media Foundation in Windows Vista. Starting with Windows 7, MFTs also support hardware-accelerated video processing, encoding and decoding for AVStream-based media devices.[4]

Enhanced Video Renderer

[edit]

Media Foundation uses the Enhanced Video Renderer (EVR) for rendering video content, which acts as a mixer as well. It can mix up to 16 simultaneous streams, with the first stream being a reference stream. All but the reference stream can have per-pixel transparency information, as well as any specified z-order. The reference stream cannot have transparent pixels, and has a fixed z-order position, at the back of all streams. The final image is composited onto a single surface by coloring each pixel according to the color and transparency of the corresponding pixel in all streams.

Internally, the EVR uses a mixer object for mixing the streams. It can also deinterlace the output and apply color correction, if required. The composited frame is handed off to a presenter object, which schedules them for rendering onto a Direct3D device, which it shares with the DWM and other applications using the device. The frame rate of the output video is synchronized with the frame rate of the reference stream. If any of the other streams (called substreams) have a different frame rate, EVR discards the extra frames (if the substream has a higher frame rate), or uses the same frame more than once (if it has a lower frame rate).

Supported media formats

[edit]

Windows Media Audio and Windows Media Video are the only default supported formats for encoding through Media Foundation in Windows Vista. For decoding, an MP3 file source is available in Windows Vista to read MP3 streams but an MP3 file sink to output MP3 is only available in Windows 7.[5] Format support is extensible however; developers can add support for other formats by writing encoder/decoder MFTs and/or custom media sources/media sinks.

Windows 7 expands upon the codec support available in Windows Vista. It includes AVI, WAV, AAC/ADTS file sources to read the respective formats,[5] an MPEG-4 file source to read MP4, M4A, M4V, MP4V, MOV and 3GP container formats[6] and an MPEG-4 file sink to output to MP4 format.[7]

Similar to Windows Vista, transcoding (encoding) support is not exposed through any built-in Windows application but several codecs are included as Media Foundation Transforms (MFTs).[5] In addition to Windows Media Audio and Windows Media Video encoders and decoders, and ASF file sink and file source introduced in Windows Vista,[5] Windows 7 includes an H.264 encoder with Baseline profile level 3 and Main profile support [8] and an AAC Low Complexity (AAC-LC) profile encoder [9]

For playback of various media formats, Windows 7 also introduces an H.264 decoder with Baseline, Main, and High-profile support, up to level 5.1,[10] AAC-LC and HE-AAC v1 (SBR) multichannel, HE-AAC v2 (PS) stereo decoders,[11] MPEG-4 Part 2 Simple Profile and Advanced Simple Profile decoders [12] which includes decoding popular codec implementations such as DivX, Xvid and Nero Digital as well as MJPEG[5] and DV[13] MFT decoders for AVI. Windows Media Player 12 uses the built-in Media Foundation codecs to play these formats by default.

MIDI playback is also not yet supported using Media Foundation.

Application support

[edit]

Applications that support Media Foundation include:

Any application that uses Protected Media Path in Windows also uses Media Foundation.

References

[edit]
  1. ^ "DXVA-HD". Archived from the original on 2012-04-20. Retrieved 2010-04-18.
  2. ^ "Example Source". GitHub. Archived from the original on 2020-11-23. Retrieved 2019-01-19.
  3. ^ "Migrating from DirectShow to Media Foundation and comparison of the two". Archived from the original on 2008-04-09. Retrieved 2007-02-22.
  4. ^ Getting Started with Hardware Codec Support in AVStream
  5. ^ a b c d e "Supported Media Formats in Media Foundation". Archived from the original on 2010-04-29. Retrieved 2010-04-18.
  6. ^ "MPEG-4 File Source". Archived from the original on 2010-03-14. Retrieved 2010-04-18.
  7. ^ "MPEG-4 File Sink". Archived from the original on 2010-08-04. Retrieved 2010-04-18.
  8. ^ "H.264 Video Encoder". Archived from the original on 2010-03-04. Retrieved 2010-04-18.
  9. ^ "AAC Encoder". Archived from the original on 2009-10-13. Retrieved 2010-04-18.
  10. ^ "H.264 Video Decoder". Archived from the original on 2010-04-21. Retrieved 2010-04-18.
  11. ^ "AAC Decoder". Archived from the original on 2010-03-18. Retrieved 2010-04-18.
  12. ^ "MPEG4 Part 2 Video Decoder". Archived from the original on 2010-02-11. Retrieved 2010-04-18.
  13. ^ "DV Video Decoder". Archived from the original on 2010-03-29. Retrieved 2010-04-18.
[edit]