Data compression: Difference between revisions
Shadowjams (talk | contribs) m Readd removed cite |
|||
Line 35: | Line 35: | ||
The very best modern lossless compressors use [[Probabilistic algorithm|probabilistic]] models, such as [[prediction by partial matching]]. The [[Burrows–Wheeler transform]] can also be viewed as an indirect form of statistical modelling. |
The very best modern lossless compressors use [[Probabilistic algorithm|probabilistic]] models, such as [[prediction by partial matching]]. The [[Burrows–Wheeler transform]] can also be viewed as an indirect form of statistical modelling. |
||
The class of [[grammar-based codes]] are recently noticed because they can extremely compress highly-repetitive text, for instance, biological data collection of same or related species, huge versioned document collection, and internet archives. [[Sequitur]], [[Re-Pair], and [[lcacomp]] are practical grammar compression algorithms which public codes are available. |
|||
In a further refinement of these techniques, statistical predictions can be coupled to an algorithm called [[arithmetic coding]]. Arithmetic coding, invented by [[Jorma Rissanen]], and turned into a practical method by Witten, Neal, and Cleary, achieves superior compression to the better-known Huffman algorithm, and lends itself especially well to adaptive data compression tasks where the predictions are strongly context-dependent. Arithmetic coding is used in the bilevel image-compression standard [[JBIG]], and the document-compression standard [[DjVu]]. The text entry system, [[Dasher]], is an inverse-arithmetic-coder. |
In a further refinement of these techniques, statistical predictions can be coupled to an algorithm called [[arithmetic coding]]. Arithmetic coding, invented by [[Jorma Rissanen]], and turned into a practical method by Witten, Neal, and Cleary, achieves superior compression to the better-known Huffman algorithm, and lends itself especially well to adaptive data compression tasks where the predictions are strongly context-dependent. Arithmetic coding is used in the bilevel image-compression standard [[JBIG]], and the document-compression standard [[DjVu]]. The text entry system, [[Dasher]], is an inverse-arithmetic-coder. |
Revision as of 20:12, 16 February 2012
This article needs additional citations for verification. (November 2011) |
In computer science and information theory, data compression, source coding,[1] or bit-rate reduction involves encoding information using fewer bits than the original representation. Compression can be either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression. Lossy compression reduces bits by identifying marginally important information and removing it.
Compression is useful because it helps reduce the consumption of resources such as data space or transmission capacity. Because compressed data must be decompressed to be used, this extra processing imposes computational or other costs through decompression. For instance, a compression scheme for video may require expensive hardware for the video to be decompressed fast enough to be viewed as it is being decompressed, and the option to decompress the video in full before watching it may be inconvenient or require additional storage. The design of data compression schemes involve trade-offs among various factors, including the degree of compression, the amount of distortion introduced (e.g., when using lossy data compression), and the computational resources required to compress and uncompress the data.
Lossy
Lossless data compression algorithms usually exploit statistical redundancy to represent data more concisely without losing information. Lossless compression is possible because most real-world data has statistical redundancy. For example, an image may have areas of colour that do not change over several pixels; instead of coding "red pixel, red pixel, ..." the data may be encoded as "279 red pixels". This is a simple example of run-length encoding; there are many schemes to reduce size by eliminating redundancy.
Lossy image compression is used in digital cameras, to increase storage capacities with minimal degradation of picture quality. Similarly, DVDs use the lossy MPEG-2 Video codec for video compression.
In lossy audio compression, methods of psychoacoustics are used to remove non-audible (or less audible) components of the signal. Compression of human speech is often performed with even more specialized techniques, so that "speech compression" or "voice coding" is sometimes distinguished as a separate discipline from "audio compression". Different audio and speech compression standards are listed under audio codecs. Voice compression is used in Internet telephony for example, while audio compression is used for CD ripping and is decoded by audio players.
Lossless
Lossless data compression is contrasted with lossy data compression. In these schemes, some loss of information is acceptable. Depending upon the application, detail can be dropped from the data to save storage space. Generally, lossy data compression schemes are guided by research on how people perceive the data in question. For example, the human eye is more sensitive to subtle variations in luminance than it is to variations in color. JPEG image compression works in part by "rounding off" less-important visual information. There is a corresponding trade-off between information lost and the size reduction. A number of popular compression formats exploit these perceptual differences, including those used in music files, images, and video.
The Lempel–Ziv (LZ) compression methods are among the most popular algorithms for lossless storage. DEFLATE is a variation on LZ which is optimized for decompression speed and compression ratio, but compression can be slow. DEFLATE is used in PKZIP, gzip and PNG. LZW (Lempel–Ziv–Welch) is used in GIF images. Also noteworthy are the LZR (LZ–Renau) methods, which serve as the basis of the Zip method. LZ methods utilize a table-based compression model where table entries are substituted for repeated strings of data. For most LZ methods, this table is generated dynamically from earlier data in the input. The table itself is often Huffman encoded (e.g. SHRI, LZX). A current LZ-based coding scheme that performs well is LZX, used in Microsoft's CAB format.
The very best modern lossless compressors use probabilistic models, such as prediction by partial matching. The Burrows–Wheeler transform can also be viewed as an indirect form of statistical modelling.
The class of grammar-based codes are recently noticed because they can extremely compress highly-repetitive text, for instance, biological data collection of same or related species, huge versioned document collection, and internet archives. Sequitur, [[Re-Pair], and lcacomp are practical grammar compression algorithms which public codes are available.
In a further refinement of these techniques, statistical predictions can be coupled to an algorithm called arithmetic coding. Arithmetic coding, invented by Jorma Rissanen, and turned into a practical method by Witten, Neal, and Cleary, achieves superior compression to the better-known Huffman algorithm, and lends itself especially well to adaptive data compression tasks where the predictions are strongly context-dependent. Arithmetic coding is used in the bilevel image-compression standard JBIG, and the document-compression standard DjVu. The text entry system, Dasher, is an inverse-arithmetic-coder.
Theory
The theoretical background of compression is provided by information theory (which is closely related to algorithmic information theory) for lossless compression, and by rate–distortion theory for lossy compression. These fields of study were essentially created by Claude Shannon, who published fundamental papers on the topic in the late 1940s and early 1950s. Coding theory is also related. The idea of data compression is deeply connected with statistical inference.
Machine learning
There is a close connection between machine learning and compression: a system that predicts the posterior probabilities of a sequence given its entire history can be used for optimal data compression (by using arithmetic coding on the output distribution), while an optimal compressor can be used for prediction (by finding the symbol that compresses best, given the previous history). This equivalence has been used as justification for data compression as a benchmark for "general intelligence".[2]
Data differencing
Data compression can be viewed as a special case of data differencing:[3][4] Data differencing consists of producing a difference given a source and a target, with patching producing a target given a source and a difference, while data compression consists of producing a compressed file given a target, and decompression consists of producing a target given only a compressed file. Thus, one can consider data compression as data differencing with empty source data, the compressed file corresponding to a "difference from nothing". This is the same as considering absolute entropy (corresponding to data compression) as a special case of relative entropy (corresponding to data differencing) with no initial data.
When one wishes to emphasize the connection, one may use the term differential compression to refer to data differencing.
Outlook and currently unused potential
It is estimated that the total amount of the information that is stored on the world's storage devices could be further compressed with existing compression algorithms by a remaining average factor of 4.5 : 1. It is estimated that the combined technological capacity of the world to store information provides 1,300 exabytes of hardware digits in 2007, but when the corresponding content is optimally compressed, this only represents 295 exabytes of Shannon information.[5]
Uses
Audio
Audio data compression, as distinguished from dynamic range compression, reduces the transmission bandwidth and storage requirements of audio data. Audio compression algorithms are implemented in software as audio codecs. Lossy audio compression algorithms provide higher compression at the cost of fidelity, are used in numerous audio applications. These algorithms almost all rely on psychoacoustics to eliminate less audible or meaningful sounds, thereby reducing the space required to store or transmit them.
In both lossy and lossless compression, information redundancy is reduced, using methods such as coding, pattern recognition and linear prediction to reduce the amount of information used to represent the uncompressed data.
The acceptable trade-off between loss of audio quality and transmission or storage size depends upon the application. For example, one 640MB compact disc (CD) holds approximately one hour of uncompressed high fidelity music, less than 2 hours of music compressed losslessly, or 7 hours of music compressed in the MP3 format at a medium bit rate. A digital sound recorder can typically store around 200 hours of clearly intelligible speech in 640MB[6].
Lossless audio compression produces a representation of digital data that decompresses to an exact digital duplicate of the original audio stream, unlike playback from lossy compression techniques such as Vorbis and MP3. Compression ratios are around 50–60% of original size[7], similar to those for generic lossless data compression. Lossy compression depends upon the quality required, but typically yields files of 5 to 20% of the size of the uncompressed original.[8] Lossless compression is unable to attain high compression ratios due to the complexity of wave forms and the rapid changes in sound forms. Codecs like FLAC, Shorten and TTA use linear prediction to estimate the spectrum of the signal. Many of these algorithms use convolution with the filter [-1 1] to slightly whiten or flatten the spectrum, thereby allowing traditional lossless compression to work more efficiently. The process is reversed upon decompression.
When audio files are to be processed, either by further compression or for editing, it is desirable to work from an unchanged original (uncompressed or losslessly compressed). Processing of a lossily compressed file for some purpose usually produces a final result inferior to creation of the same compressed file from an uncompressed original. In addition to sound editing or mixing, lossless audio compression is often used for archival storage, or as master copies.
A number of lossless audio compression formats exist. Shorten was an early lossless format. Newer ones include Free Lossless Audio Codec (FLAC), Apple's Apple Lossless, MPEG-4 ALS, Microsoft's Windows Media Audio 9 Lossless (WMA Lossless), Monkey's Audio, and TTA. See list of lossless codecs for a complete list.
Some audio formats feature a combination of a lossy format and a lossless correction; this allows stripping the correction to easily obtain a lossy file. Such formats include MPEG-4 SLS (Scalable to Lossless), WavPack, and OptimFROG DualStream.
Other formats are associated with a distinct system, such as:
- Direct Stream Transfer, used in Super Audio CD
- Meridian Lossless Packing, used in DVD-Audio, Dolby TrueHD, Blu-ray and HD DVD
Lossy audio compression
Lossy audio compression is used in a wide range of applications. In addition to the direct applications (mp3 players or computers), digitally compressed audio streams are used in most video DVDs; digital television; streaming media on the internet; satellite and cable radio; and increasingly in terrestrial radio broadcasts. Lossy compression typically achieves far greater compression than lossless compression (data of 5 percent to 20 percent of the original stream, rather than 50 percent to 60 percent), by discarding less-critical data.
The innovation of lossy audio compression was to use psychoacoustics to recognize that not all data in an audio stream can be perceived by the human auditory system. Most lossy compression reduces perceptual redundancy by first identifying sounds which are considered perceptually irrelevant, that is, sounds that are very hard to hear. Typical examples include high frequencies, or sounds that occur at the same time as louder sounds. Those sounds are coded with decreased accuracy or not coded at all.
Due to the nature of lossy algorithms, audio quality suffers when a file is decompressed and recompressed (digital generation loss). This makes lossy compression unsuitable for storing the intermediate results in professional audio engineering applications, such as sound editing and multitrack recording. However, they are very popular with end users (particularly MP3), as a megabyte can store about a minute's worth of music at adequate quality.
Coding methods
In order to determine what information in an audio signal is perceptually irrelevant, most lossy compression algorithms use transforms such as the modified discrete cosine transform (MDCT) to convert time domain sampled waveforms into a transform domain. Once transformed, typically into the frequency domain, component frequencies can be allocated bits according to how audible they are. Audibility of spectral components is determined by first calculating a masking threshold, below which it is estimated that sounds will be beyond the limits of human perception.
The masking threshold is calculated using the absolute threshold of hearing and the principles of simultaneous masking—the phenomenon wherein a signal is masked by another signal separated by frequency, and, in some cases, temporal masking—where a signal is masked by another signal separated by time. Equal-loudness contours may also be used to weight the perceptual importance of different components. Models of the human ear-brain combination incorporating such effects are often called psychoacoustic models.
Other types of lossy compressors, such as the linear predictive coding (LPC) used with speech, are source-based coders. These coders use a model of the sound's generator (such as the human vocal tract with LPC) to whiten the audio signal (i.e., flatten its spectrum) prior to quantization. LPC may also be thought of as a basic perceptual coding technique; reconstruction of an audio signal using a linear predictor shapes the coder's quantization noise into the spectrum of the target signal, partially masking it.
Lossy formats are often used for the distribution of streaming audio, or interactive applications (such as the coding of speech for digital transmission in cell phone networks). In such applications, the data must be decompressed as the data flows, rather than after the entire data stream has been transmitted. Not all audio codecs can be used for streaming applications, and for such applications a codec designed to stream data effectively will usually be chosen.
Latency results from the methods used to encode and decode the data. Some codecs will analyze a longer segment of the data to optimize efficiency, and then code it in a manner that requires a larger segment of data at one time in order to decode. (Often codecs create segments called a "frame" to create discrete data segments for encoding and decoding.) The inherent latency of the coding algorithm can be critical; for example, when there is two-way transmission of data, such as with a telephone conversation, significant delays may seriously degrade the perceived quality.
In contrast to the speed of compression, which is proportional to the number of operations required by the algorithm, here latency refers to the number of samples which must be analysed before a block of audio is processed. In the minimum case, latency is 0 zero samples (e.g., if the coder/decoder simply reduces the number of bits used to quantize the signal). Time domain algorithms such as LPC also often have low latencies, hence their popularity in speech coding for telephony. In algorithms such as MP3, however, a large number of samples have to be analyzed in order to implement a psychoacoustic model in the frequency domain, and latency is on the order of 23 ms (46 ms for two-way communication).
Speech encoding
Speech encoding is an important category of audio data compression. The perceptual models used to estimate what a human ear can hear are generally somewhat different from those used for music. The range of frequencies needed to convey the sounds of a human voice are normally far narrower than that needed for music, and the sound is normally less complex. As a result, speech can be encoded at high quality using a relatively low bit rate.
This is accomplished, in general, by some combination of two approaches:
- Only encoding sounds that could be made by a single human voice.
- Throwing away more of the data in the signal—keeping just enough to reconstruct an "intelligible" voice rather than the full frequency range of human hearing.
Perhaps the earliest algorithms used in speech encoding (and audio data compression in general) were the A-law algorithm and the µ-law algorithm.
History
A literature compendium for a large variety of audio coding systems was published in the IEEE Journal on Selected Areas in Communications (JSAC), February 1988. While there were some papers from before that time, this collection documented an entire variety of finished, working audio coders, nearly all of them using perceptual (i.e. masking) techniques and some kind of frequency analysis and back-end noiseless coding.[9] Several of these papers remarked on the difficulty of obtaining good, clean digital audio for research purposes. Most, if not all, of the authors in the JSAC edition were also active in the MPEG-1 Audio committee.
The world's first commercial broadcast automation audio compression system was developed by Oscar Bonello, an Engineering professor at the University of Buenos Aires.[10] In 1983, using the psychoacoustic principle of the masking of critical bands first published in 1967,[11] he started developing a practical application based on the recently developed IBM PC computer, and the broadcast automation system was launched in 1987 under the name Audicom. 20 years later, almost all the radio stations in the world were using similar technology, manufactured by a number of companies.
Video
Video compression uses modern coding techniques to reduce redundancy in video data. Most video compression algorithms and codecs combine spatial image compression and temporal motion compensation. Video compression is a practical implementation of source coding in information theory. In practice most video codecs also use audio compression techniques in parallel to compress the separate, but combined data streams.
The majority of video compression algorithms use lossy compression. Large amounts of data may be eliminated while being perceptually indistinguishable. As in all lossy compression, there is a tradeoff between video quality, cost of processing the compression and decompression, and system requirements. Highly compressed video may present visible or distracting artifacts.
Video compression typically operates on square-shaped groups of neighboring pixels, often called macroblocks. These pixel groups or blocks of pixels are compared from one frame to the next and the video compression codec sends only the differences within those blocks. In areas of video with more motion, the compression must encode more data to keep up with the larger number of pixels that are changing. Commonly during explosions, flames, flocks of animals, and in some panning shots, the high-frequency detail leads to quality decreases or to increases in the variable bitrate.
Encoding theory
Video data may be represented as a series of still image frames. The sequence of frames contains spatial and temporal redundancy that video compression algorithms attempt to eliminate or code in a smaller size. Similarities can be encoded by only storing differences between frames, or by using perceptual features of human vision. For example, small differences in color are more difficult to perceive than are changes in brightness. Compression algorithms can average a color across these similar areas to reduce space, in a manner similar to those used in JPEG image compression.[12] Some of these methods are inherently lossy while others may preserve all relevant information from the original, uncompressed video.
One of the most powerful techniques for compressing video is interframe compression. Interframe compression uses one or more earlier or later frames in a sequence to compress the current frame, while intraframe compression uses only the current frame, effectively being image compression.
The most commonly used method works by comparing each frame in the video with the previous one. If the frame contains areas where nothing has moved, the system simply issues a short command that copies that part of the previous frame, bit-for-bit, into the next one. If sections of the frame move in a simple manner, the compressor emits a (slightly longer) command that tells the decompresser to shift, rotate, lighten, or darken the copy: a longer command, but still much shorter than intraframe compression. Interframe compression works well for programs that will simply be played back by the viewer, but can cause problems if the video sequence needs to be edited.
Because interframe compression copies data from one frame to another, if the original frame is simply cut out (or lost in transmission), the following frames cannot be reconstructed properly. Some video formats, such as DV, compress each frame independently using intraframe compression. Making 'cuts' in intraframe-compressed video is almost as easy as editing uncompressed video: one finds the beginning and ending of each frame, and simply copies bit-for-bit each frame that one wants to keep, and discards the frames one doesn't want. Another difference between intraframe and interframe compression is that with intraframe systems, each frame uses a similar amount of data. In most interframe systems, certain frames (such as "I frames" in MPEG-2) aren't allowed to copy data from other frames, and so require much more data than other frames nearby.
It is possible to build a computer-based video editor that spots problems caused when I frames are edited out while other frames need them. This has allowed newer formats like HDV to be used for editing. However, this process demands a lot more computing power than editing intraframe compressed video with the same picture quality.
Today, nearly all commonly used video compression methods (e.g., those in standards approved by the ITU-T or ISO) apply a discrete cosine transform (DCT) for spatial redundancy reduction. Other methods, such as fractal compression, matching pursuit and the use of a discrete wavelet transform (DWT) have been the subject of some research, but are typically not used in practical products (except for the use of wavelet coding as still-image coders without motion compensation). Interest in fractal compression seems to be waning, due to recent theoretical analysis showing a comparative lack of effectiveness of such methods.[citation needed]
Timeline
The following table is a partial history of international video compression standards.
Year | Standard | Publisher | Popular Implementations |
---|---|---|---|
1984 | H.120 | ITU-T | |
1990 | H.261 | ITU-T | Videoconferencing, Videotelephony |
1993 | MPEG-1 Part 2 | ISO, IEC | Video-CD |
1995 | H.262/MPEG-2 Part 2 | ISO, IEC, ITU-T | DVD Video, Blu-ray, Digital Video Broadcasting, SVCD |
1996 | H.263 | ITU-T | Videoconferencing, Videotelephony, Video on Mobile Phones (3GP) |
1999 | MPEG-4 Part 2 | ISO, IEC | Video on Internet (DivX, Xvid ...) |
2003 | H.264/MPEG-4 AVC | ISO, IEC, ITU-T | Blu-ray, Digital Video Broadcasting, iPod Video, HD DVD |
2008 | VC-2 (Dirac) | ISO, BBC | Video on Internet, HDTV broadcast, UHDTV |
See also
- Algorithmic complexity theory
- Audio signal processing
- Audio storage
- Auditory masking
- Burrows–Wheeler transform
- Calgary Corpus
- Canterbury Corpus
- Comparison of audio codecs
- Comparison of file archivers
- Context mixing
- Data compression symmetry
- Data deduplication
- D-frame
- Dictionary coder
- Digital signal processing
- Distributed source coding
- Dyadic distribution
- Dynamic Markov Compression
- Elias gamma coding
- Entropy encoding
- Fibonacci coding
- Fractal transform
- Golomb coding
- HTTP compression
- Image compression
- Information entropy
- List of archive formats
- List of codecs
- Magic compression algorithm
- Minimum description length
- Minimum message length
- Modulo-N code
- Mu-law
- Prediction by partial matching
- Psychoacoustics
- Range encoding
- Run-length encoding
- Self-extracting archive
- Subband encoding
- Subjective video quality
- Transcoding
- Universal code (data compression)
- Vector quantization
- Video compression format
- Video compression picture types
- Video quality
- Wavelet compression
References
- ^
Wade, Graham (1994). Signal coding and processing (2 ed.). Cambridge University Press. p. 34. ISBN 9780521423366. Retrieved 2011-12-22.
The broad objective of source coding is to exploit or remove 'inefficient' redundancy in the PCM source and thereby achieve a reduction in the overall source rate R.
{{cite book}}
: More than one of|pages=
and|page=
specified (help) - ^ Rationale for a Large Text Compression Benchmark
- ^ RFC 3284
- ^ Korn, D.G.; Vo, K.P. (1995), B. Krishnamurthy (ed.), Vdelta: Differencing and Compression, Practical Reusable Unix Software, John Wiley & Sons
- ^ "The World’s Technological Capacity to Store, Communicate, and Compute Information"
- ^ The Olympus WS-120 digital speech recorder, according to its manual, can store about 178 hours of speech-quality audio in .WMA format in 500MB of flash memory.
- ^ Comparison of lossless codecs on the FLAC's website
- ^ Comparison of lossy codecs
- ^ Journal on Selected Areas in Communications, February 1988
- ^ Solidyne... 40 years of innovation
- ^ The Ear as a Communication Receiver. English translation of Das Ohr als Nachrichtenempfänger by Eberhard Zwicker and Richard Feldtkeller. Translated from German by Hannes Müsch, Søren Buus, and Mary Florentine. Originally published in 1967; Translation published in 1999
- ^ http://www.faqs.org/faqs/jpeg-faq/part1/
External links
- Data Compression Basics (Video)
- Video compression 4:2:2 10-bit and its benefits
- Why does 10-bit save bandwidth (even when content is 8-bit)?
- Which compression technology should be used
- Wiley - Introduction to Compression Theory
- EBU subjective listening tests on low-bitrate audio codecs
- Audio Archiving Guide: Music Formats (Guide for helping a user pick out the right codec)
- MPEG 1&2 video compression intro (pdf format)
- hydrogenaudio.org wiki comparison
- Introduction to Data Compression by Guy E Blelloch from CMU
- HD Greetings - 1080p Uncompressed source material for compression testing and research
- Explanation of lossless signal compression method used by most codecs
- Interactive blind listening tests of audio codecs over the internet
- TestVid - 2,000+ HD and other uncompressed source video clips for compression testing
- Videsignline - Intro to Video Compression