MP3: Difference between revisions
→Bit rate: Unit consistency |
|||
Line 40: | Line 40: | ||
A reference simulation software implementation, written in the C language and known as ISO 11172-5, was developed by the members of the ISO MPEG Audio committee in order to produce bit compliant MPEG Audio files (Layer 1, Layer 2, Layer 3). Working in non real time on a number of operating systems, it was able to demonstrate the first real time hardware decoding (DSP based) of compressed audio. Some other real time implementation of MPEG Audio encoders were available for the purpose of digital broadcasting (radio DAB, television DVB) towards consumer receivers and set top boxes. |
A reference simulation software implementation, written in the C language and known as ISO 11172-5, was developed by the members of the ISO MPEG Audio committee in order to produce bit compliant MPEG Audio files (Layer 1, Layer 2, Layer 3). Working in non real time on a number of operating systems, it was able to demonstrate the first real time hardware decoding (DSP based) of compressed audio. Some other real time implementation of MPEG Audio encoders were available for the purpose of digital broadcasting (radio DAB, television DVB) towards consumer receivers and set top boxes. |
||
Later, on [[July 7]] [[1994]] the [[Fraunhofer Society]] released the first software MP3 encoder called [[l3enc]]. The [[filename extension]] ''.mp3'' was chosen by the Fraunhofer team on [[July 14]], [[1995]] (previously, the files had been named ''.bit''). With the first real-time software MP3 player [[Winplay3]] (released [[September 9]], [[1995]]) many people were able to encode and playback MP3 files on their PCs. Because of the relatively small [[hard drive]]s back in that time ( |
Later, on [[July 7]] [[1994]] the [[Fraunhofer Society]] released the first software MP3 encoder called [[l3enc]]. The [[filename extension]] ''.mp3'' was chosen by the Fraunhofer team on [[July 14]], [[1995]] (previously, the files had been named ''.bit''). With the first real-time software MP3 player [[Winplay3]] (released [[September 9]], [[1995]]) many people were able to encode and playback MP3 files on their PCs. Because of the relatively small [[hard drive]]s back in that time (~ 500 [[MB]]) the technology was essential to store non-instrument based (see: [[tracker]] and [[midi]]) music for listening pleasure on a computer. |
||
=== MP2 === |
=== MP2 === |
Revision as of 18:31, 7 February 2007
Filename extension |
.mp3 |
---|---|
Internet media type |
audio/mpeg |
Type of format | Audio |
MPEG-1 Audio Layer 3, more commonly referred to as MP3, is a popular digital audio encoding and lossy compression format and algorithm, designed to greatly reduce the amount of data required to represent audio, yet still sound like a faithful reproduction of the original uncompressed audio to most listeners. It was invented by a team of German engineers who worked in the framework of the EUREKA 147 DAB digital radio research program, and it became an ISO/IEC standard in 1991.
Overview
MP3 is an audio-specific compression format. It provides a representation of pulse-code modulation-encoded audio in much less space than straightforward methods, by using psychoacoustic models to discard components less audible to human hearing, and recording the remaining information in an efficient manner. Similar principles are used by JPEG, a lossy image compression format.
History
Development
MPEG-1 Audio Layer 2 encoding began as the Digital Audio Broadcast (DAB) project managed by Egon Meier-Engelen of the Deutsche Forschungs- und Versuchsanstalt für Luft- und Raumfahrt (later on called Deutsches Zentrum für Luft- und Raumfahrt, German Aerospace Center) in Germany. This project was financed by the European Union as a part of the EUREKA research program where it was commonly known as EU-147. EU-147 ran from 1987 to 1994.
In 1991, there were two proposals available: Musicam (known as Layer 2), and ASPEC (Adaptive Spectral Perceptual Entropy Coding). The Musicam technique, as proposed by Philips (The Netherlands), CCETT (France) and Institut für Rundfunktechnik (Germany) was chosen due to its simplicity and error robustness, as well as its low computational power associated to the encoding of high quality compressed audio. The Musicam format, based on sub-band encoding, was a key to settle the basis of the MPEG Audio compression format (sampling rates, structure of frames, headers, number of samples per frame). Its technology and ideas were fully incorporated into the definition of ISO MPEG Audio Layer I and Layer II and further on of the Layer III (MP3) format. Under the chairmanship of Professor Mussmann (University of Hannover) the editing of the standard was made under the responsibilities of Leon van de Kerkhof (Layer I) and Gerhard Stoll (Layer II).
A working group consisting of Leon Van de Kerkhof (The Netherlands), Gerhard Stoll (Germany), Leonardo Chiariglione (Italy), Yves-François Dehery (France), Karlheinz Brandenburg (Germany) took ideas from Musicam and ASPEC, added some of their own ideas and created MP3, which was designed to achieve the same quality at 128 kbit/s as MP2 at 192 kb/s.
All algorithms were approved in 1991, finalized in 1992 as part of MPEG-1, the first standard suite by MPEG, which resulted in the international standard ISO/IEC 11172-3, published in 1993. Further work on MPEG audio was finalized in 1994 as part of the second suite of MPEG standards, MPEG-2, more formally known as international standard ISO/IEC 13818-3, originally published in 1995.
Compression efficiency of encoders is typically defined by the bit rate because compression rate depends on the bit depth and sampling rate of the input signal. Nevertheless, there are often published compression rates that use the CD parameters as references (44.1 kHz, 2 channels at 16 bits per channel or 2x16 bit). Sometimes the Digital Audio Tape (DAT) SP parameters are used (48 kHz, 2x16 bit). Compression ratios with this reference are higher, which demonstrates the problem of the term compression ratio for lossy encoders.
Karlheinz Brandenburg used a CD recording of Suzanne Vega's song "Tom's Diner" to assess the MP3 compression algorithm. This song was chosen because of its softness and simplicity, making it easier to hear imperfections in the compression format during playbacks. Some have taken to jokingly refer to Suzanne Vega as "The mother of MP3". Some more serious and critical audio excerpts (glockenspiel, triangle, accordion, ...) were taken from the EBU V3/SQAM reference compact disc and have been used by professional sound engineers to assess the subjective quality of the MPEG Audio formats.
Going public
A reference simulation software implementation, written in the C language and known as ISO 11172-5, was developed by the members of the ISO MPEG Audio committee in order to produce bit compliant MPEG Audio files (Layer 1, Layer 2, Layer 3). Working in non real time on a number of operating systems, it was able to demonstrate the first real time hardware decoding (DSP based) of compressed audio. Some other real time implementation of MPEG Audio encoders were available for the purpose of digital broadcasting (radio DAB, television DVB) towards consumer receivers and set top boxes.
Later, on July 7 1994 the Fraunhofer Society released the first software MP3 encoder called l3enc. The filename extension .mp3 was chosen by the Fraunhofer team on July 14, 1995 (previously, the files had been named .bit). With the first real-time software MP3 player Winplay3 (released September 9, 1995) many people were able to encode and playback MP3 files on their PCs. Because of the relatively small hard drives back in that time (~ 500 MB) the technology was essential to store non-instrument based (see: tracker and midi) music for listening pleasure on a computer.
MP2
In October 1993, MP2 (MPEG-1 Audio Layer 2) files appeared on the Internet and were often played back using the Xing MPEG Audio Player, and later in a program for Unix by Tobias Bading called MAPlay, which was initially released on February 22, 1994 (MAPlay was also ported to Microsoft Windows).
Initially the only encoder available for MP2 production was the Xing Encoder, accompanied by the program CDDA2WAV, a CD processor that transforms CD audio tracks to Waveform Audio Files.
The Internet Underground Music Archive (IUMA) is generally recognized as the start of the on-line music revolution. IUMA was the Internet's first high-fidelity music web site, hosting thousands of authorized MP2 recordings before MP3 or the web was popularized.
Internet
In the first half of 1995 through the late 1990s, MP3 files began flourishing on the Internet. MP3 popularity was mostly due to, and interchangeable with, the successes of companies and software packages like Nullsoft's Winamp (released in 1997), mpg123, and Napster (released in 1999). Those programs made it very easy for the average user to playback, create, share, and collect MP3s.
Controversies regarding peer-to-peer file sharing of MP3 files have spread widely in recent years — largely because high compression enables sharing of files that would otherwise be too large and cumbersome to easily share. Some major record companies reacted by filing a lawsuit against Napster, due to the vastly increased spread of MP3s through the Internet, to protect their copyrights (see also intellectual property).
Commercial online music distribution services (like the iTunes Store) usually prefer other/proprietary music file formats that support Digital Rights Management (DRM) to control and restrict the use of digital music. The use of formats that support DRM is in an attempt to prevent copyright infringement of copyright protected materials, but methods exist to defeat most protection schemes, although such methods are considered illegal in many countries. Some services however, (like eMusic) do use the MP3 format, mostly because of MP3's universal compatibility with portable players.
Encoding audio
The MPEG-1 standard does not include a precise specification for an MP3 encoder. The decoding algorithm and file format, as a contrast, are well defined. Implementers of the standard were supposed to devise their own algorithms suitable for removing parts of the information in the raw audio (or rather its MDCT representation in the frequency domain). During encoding 576 time domain samples are taken and are transformed to 576 frequency domain samples. If there is a transient 192 samples are taken instead of 576. This is done to limit the temporal spread of quantization noise accompanying the transient.
This is the domain of psychoacoustics: the study of subjective human perception of sounds.
As a result, there are many different MP3 encoders available, each producing files of differing quality. Comparisons are widely available, so it is easy for a prospective user of an encoder to research the best choice. It must be kept in mind that an encoder that is proficient at encoding at higher bit rates (such as LAME, which is in widespread use for encoding at higher bit rates) is not necessarily as good at other, lower bit rates.
Decoding audio
Decoding, on the other hand, is carefully defined in the standard. Most decoders are "bitstream compliant", meaning that the decompressed output they produce from a given MP3 file will be the same (within a specified degree of rounding tolerance) as the output specified mathematically in the ISO/IEC standard document. The MP3 file has a standard format which is a frame consisting of 384, 576, or 1152 samples (depends on MPEG version and layer) and all the frames have associated header information (32 bits) and side information (9, 17, or 32 bytes, depending on MPEG version and stereo/mono). The header and side information help the decoder to decode the associated Huffman encoded data correctly.
Therefore, for the most part, comparison of decoders is almost exclusively based on how computationally efficient they are (i.e., how much memory or CPU time they use in the decoding process).
Bit rate
The bit rate is variable for MP3 files. The general rule is that more information is included from the original sound file when a higher bit rate is used, and thus the higher the quality during playback. In the early days of MP3 encoding, a fixed bit rate was used for the entire file.
Bit rates available in MPEG-1 Layer 3 are 32, 40, 48, 56, 64, 80, 96, 112, 128, 160, 192, 224, 256 and 320 kbit/s, and the available sampling frequencies are 32, 44.1 and 48 kHz. 44.1 kHz is almost always used (coincides with the sampling rate of compact discs), and 128 kbit/s has become the de facto "good enough" standard, although 192 kbit/s is becoming increasingly popular over peer-to-peer file sharing networks. MPEG-2 and the (unofficial) MPEG-2.5 include some additional bit rates: 8, 16, 24, 32, 40, 48, 56, 64, 80, 96, 112, 128, 144, 160 kb/s; while providing lower sampling frequencies (8, 11.025, 12, 16, 22.05 and 24 kHz).
Variable bit rates (VBR) are also possible. Audio in MP3 files is divided into frames, each of which has its own bit rate, so it is possible to change the bit rate dynamically as the file is encoded. This technique makes it possible to use more bits for parts of the sound with higher dynamics (more sound movement) and fewer bits for parts with lower dynamics, further increasing quality and decreasing storage space. For example, a portion composed of pure tones could be encoded at 48 kb/s, taking up less space without any noticeable difference, while a portion played by a full symphony orchestra is encoded at 224 kb/s to express it with greater fidelity. Although not originally implemented, many encoders now use this technique to greater or lesser extent.
Non-standard bit rates up to 640 kb/s can be achieved with the LAME encoder and the --freeformat option, but few MP3 players can play those files. Gabriel Bouvigne, a principal developer of the LAME project, offered the following information about freeformat streams: [1]
"freeformat IS COMPLIANT with the mp3 standard. Decoders are required to be able to decode it up to 320kbps, but decoding higher bit rate freeformat streams is not mandatory.
Practically, it means that higher than 320kbps, only a few decoders support it."
Audio quality
Because MP3 is a lossy format, it is able to provide a number of different options for its "bit rate" — that is, the number of bits of encoded data that are used to represent each second of audio. Typically, rates chosen are between 128 and 320 kilobits per second. By contrast, uncompressed audio as stored on a compact disc has a bit rate of 1411.2 kb/s (16 bits/sample × 44100 samples/second × 2 channels).
MP3 files encoded with a lower bit rate will generally play back at a lower quality. With too low a bit rate, "compression artifacts" (i.e., sounds that were not present in the original recording) may be audible in the reproduction. A good demonstration of compression artifacts is provided by the sound of applause: it is hard to compress because of its randomness and sharp attacks. Therefore compression artifacts can be heard as ringing or pre-echo.
As well as the bit rate of the encoded file, the quality of MP3 files depends on the quality of the encoder and the difficulty of the signal being encoded. As the MP3 standard allows quite a bit of freedom with encoding algorithms, different encoders may feature quite different quality, even when targeting similar bit rates. As an example, in a public collective test[2] (07/2003) featuring two different MP3 encoders at about 128kbps, one scored 3.66 on a 1-5 scale, while the other scored only 2.22.
Quality is heavily dependent on the choice of encoder and encoding parameters. While quality around 128kbps was somewhere between annoying and acceptable with older encoders, modern MP3 encoders can provide very good quality at those bitrates [3] (01/2006), not statistically different from quality provided by AAC, the technical successor of MP3. However, in 1998, MP3 at 128kbps was only providing quality equivalent to AAC-LC at 96kbps and MP2 at 192kbps [4].
The transparency threshold of MP3 can be estimated to be at about 128k with good encoders on typical music as evidenced by its strong performance in the above test, however some particularly difficult material can require 192k or higher. As with all lossy formats, some samples can not be encoded perfectly transparent to all users. Thus many users opt for 192k as a good trade off.
At lower bit rates, the quality of MP3 quickly degrades, and is far behind AAC quality at 32kbps, as demonstrated by a collective listening test (06/2004)[5].
It is also important to note that perceived quality can be influenced by listening environment (ambient noise), listener attention, and listener training.
File structure
An MP3 file is made up of multiple MP3 frames which consist of the MP3 header and the MP3 data. This sequence of frames is called an Elementary stream. Frames are independent items: one can cut the frames from a file and an MP3 player would be able to play it. The MP3 data is the actual audio payload. The diagram shows that the MP3 header consists of a sync word which is used to identify the beginning of a valid frame. This is followed by a bit indicating that this is the MPEG standard and two bits that indicate that layer 3 is being used, hence MPEG-1 Audio Layer 3 or MP3. After this, the values will differ depending on the MP3 file. The range of values for each section of the header along with the specification of the header is defined by ISO/IEC 11172-3. Most MP3 files today contain ID3 metadata which precedes or follows the MP3 frames; this is also shown in the diagram.
Design limitations
There are several limitations inherent to the MP3 format that cannot be overcome by using a better encoder.
Newer audio compression formats such as Vorbis and AAC no longer have these limitations.
In technical terms, MP3 is limited in the following ways:
- Bit rate is limited to a maximum of 320 kb/s (while some encoders can create higher bit rates, there is little-to-no support for these higher bit rate mp3s)
- Time resolution can be too low for highly transient signals, causing some smearing of percussive sounds
- Frequency resolution is limited by the small long block window size, decreasing coding efficiency
- No scale factor band for frequencies above 15.5/15.8 kHz
- Joint stereo is done on a frame-to-frame basis
- Encoder/decoder overall delay is not defined, which means lack of official provision for gapless playback. However, some encoders such as LAME can attach additional metadata that will allow players that are aware of it to deliver seamless playback.
Nevertheless, a well-tuned MP3 encoder can perform competitively even with these restrictions.
ID3 and other tags
A "tag" in a compressed audio file, is a section of the file that contains metadata such as the title, artist, album, track number or other information about the file's contents.
As of 2006, the most widespread standard tag formats are ID3v1 and ID3v2, and the more recently introduced APEv2.
APEv2 was originally developed for the MPC file format (see the APEv2 specification). APEv2 can coexist with ID3 tags in the same file, but it can also be used by itself.
Tag editing functionality is often built-in to MP3 players and editors, but there also exist tag editors dedicated to the purpose.
Volume normalization
As compact discs and other various sources are recorded and mastered at different volumes, it is useful to store volume information about a file in the tag so that at playback time, the volume can be dynamically adjusted.
A few standards for encoding the gain of an MP3 file have been proposed. The idea is to normalize the average volume (not the volume peaks) of audio files, so that the volume does not change between consecutive tracks. This should not be confused with dynamic range compression (DRC) which is a form of normalization used in audio mastering.
The most popular and widely used solution for storing replay gain is known simply as "Replay Gain". Typically, the average volume and clipping information about audio track is stored in the metadata tag.
One can download audio converting software to change the formats.
Licensing and patent issues
Thomson Consumer Electronics controls licensing of the MPEG-1/2 Layer 3 patents in many countries, including the United States, Japan, Canada and EU countries[6]. Thomson has been actively enforcing these patents. Due to different practices in different European countries when granting software patents under the European Patent Convention, it is unclear whether the patents would be upheld by national European courts.
For current information about Fraunhofer IIS and Thomson's patent portfolio and licensing terms and fees see their website mp3licensing.com. MP3 license revenues generated ca. 100 million Euro revenue to the Fraunhofer Society in 2005. [7]
In September 1998, the Fraunhofer Institute sent a letter to several developers of MP3 software stating that a license was required to "distribute and/or sell decoders and/or encoders". The letter claimed that unlicensed products "infringe the patent rights of Fraunhofer and THOMSON. To make, sell and/or distribute products using the [MPEG Layer-3] standard and thus our patents, you need to obtain a license under these patents from us." [citation needed]
These patent issues significantly slowed the development of unlicensed MP3 software [citation needed] and led to increased focus on creating and popularizing alternatives such as WMA and Ogg Vorbis. Microsoft, the makers of the Windows operating system, chose to move away from MP3 to their own proprietary Windows Media formats to avoid the licensing issues associated with the patents [citation needed]. Until the key patents expire, unlicensed encoders and players appear to be infringing articles in countries that recognize those patents.
In spite of the patent restrictions, the perpetuation of the MP3 format continues; the reasons for this appear to be the network effects caused by:
- familiarity with the format,
- the large quantity of music now available in the MP3 format,
- the wide variety of existing software and hardware that takes advantage of the file format,
- the lack of DRM restrictions, which makes MP3 files easy to edit, copy and distribute over networks,
- the majority of home users not knowing or not caring about the patents controversy, which is in general irrelevant to their choice of the music format for personal use.
Additionally, patent holders declined to enforce license fees on open source decoders, allowing many free MP3 decoders to develop. [citation needed] Furthermore, while attempts have been made to discourage distribution of encoder binaries, Thomson has stated that individuals using free MP3 encoders are not required to pay fees. [citation needed] Thus while patent fees have been an issue for companies attempting to use MP3, they have not meaningfully impacted users, allowing the format to grow in popularity.
Sisvel S.p.A. [8] and its US subsidiary Audio MPEG, Inc. [9] previously sued Thomson for patent infringement on MP3 technology[10], but those disputes were resolved in November 2005 with Sisvel granting Thomson a license to their patents. Motorola also recently signed with Audio MPEG to license MP3-related patents. With Thomson and Sisvel both owning separate patents which they claim are needed by the codec, the legal status of MP3 remains unclear.
In September 2006 German officials seized MP3 players from SanDisk's booth at the IFA show in Berlin after an Italian patents firm won an injunction against the company in a dispute over licencing rights. The injunction was later reversed by a Berlin judge [11]; but that reversal was in turn blocked the same day by another judge from the same court, "bringing the Patent Wild West to Germany" in the words of one commentator. [12].
Alternative technologies
Many other lossy audio codecs exist, including:
- Ogg Vorbis from the Xiph.org Foundation, a patent free codec with free software implementations available.
- MPEG-1/2 Audio Layer 2 (MP2), MP3's predecessor;
- MPEG-4 AAC, MP3's successor, used by Apple's iTunes Music Store and iPod
- MPC, also known as Musepack (formerly MP+), a derivative of MP2;
- mp3PRO from Thomson Multimedia combining MP3 with SBR;
- AC-3, used in Dolby Digital and DVD;
- ATRAC, used in Sony's Minidisc;
- Windows Media Audio (WMA) from Microsoft.
- QDesign, used in QuickTime at low bit rates;
- AMR-WB+ Enhanced Adaptive Multi Rate WideBand codec, optimized for cellular and other limited bandwidth use;
- RealAudio from RealNetworks, frequently in use for streaming on websites;
- Speex, a patent free codec based on CELP specifically designed for speech and VoIP, with free software implementations available.
mp3PRO, MP3, AAC, and MP2 are all members of the same technological family and depend on roughly similar psychoacoustic models. The Fraunhofer Gesellschaft owns many of the basic patents underlying these codecs, with Dolby Labs, Sony, Thomson Consumer Electronics, and AT&T holding other key patents.
There are also some lossless audio compression methods used on the Internet. While they are not similar to MP3, they are good examples of other compression schemes available. These include:
- FLAC stands for 'Free Lossless Audio Codec'
- Monkey's Audio
- SHN, also known as Shorten
- TTA
- WavPack
- Apple Lossless
Listening tests have attempted to find the best-quality lossy audio codecs at certain bit rates. At 128 kb/s, Ogg Vorbis, AAC, MPC and WMA Pro tied for first place with LAME MP3 a little behind. At 64 kb/s, AAC-HE and mp3pro performed marginally better than other codecs. At high bit rates (128 kb/s+), most people do not hear significant differences. What is considered 'CD quality' is quite subjective.
Though proponents of newer codecs such as WMA and RealAudio have asserted that their respective algorithms can achieve CD quality at 64 kb/s, listening tests have shown otherwise; however, the quality of these codecs at 64 kbit/s is definitely superior to MP3 at the same bit rate. The developers of the patent-free Ogg Vorbis codec claim that their algorithm surpasses MP3, RealAudio and WMA sound quality, and the listening tests mentioned above support that claim. Thomson claims that its mp3PRO codec achieves CD quality at 64 kb/s, but listeners have reported that a 64 kb/s mp3PRO file compares in quality to a 112 kb/s MP3 file and does not come reasonably close to CD quality until about 80 kb/s.
MP3, which was designed and tuned for use alongside MPEG-1/2 Video, generally performs poorly on monaural data at less than 48 kb/s or in stereo at less than 80 kb/s.
See also
- Comparison of audio codecs
- Copyright infringement
- Digital audio player
- ID3
- Joint stereo
- Media player
- MP3 blog
- MP3 Surround
- Streaming Media