Jump to content

Talk:Lossless compression

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Untitled

"Many of these methods are implemented in open-source and proprietary tools,". What else is there? Could the entry go without saying that?

--Daijoubu

history

please add a history section for lossless —Preceding unsigned comment added by Sp0 (talkcontribs) 08:56, 4 November 2007 (UTC)[reply]

The Million Random Digit Challenge

there is a public version of low entropy file compression test[1]

Lossless ... makes some files longer argument flawed?

Is the section title "Lossless data compression makes some files longer" inaccurate? The counting argument proves that no algorithm can make all files smaller, but does not address leaving the file unchanged. For any compression algorithm, consider deriving a new algorithm that produces a flag in the "compressed" files header that may be used to indicate that "no" compression was in fact performed (because the resulting file would be larger than the original).

--Kberg <kberg@tampabay.rr.com>

If you add an extra bit (or byte), you're making the file longer. --Captain Segfault
  • I still don't see it. We're rejecting the notion that a lossless algorithm can transform any file into a distinct smaller file. I agree that "In other words for any (lossless) data compression algorithm there will be an input data set that does not get smaller when processed by the algorithm". How does it follow that some files must be larger after applying compression? --gb 00:43, Feb 17, 2005 (UTC)
Compressed files of the same length as what they compress can account for exactly the same number of different uncompressed files (same number of bits), and shorter compressed files can obviously account for fewer. So when you total the number of combinations for just shorter and equal-length files, you could only come up with a smaller number of accounted-for uncompressed files/combinations. Some would have to be larger to account for the rest. And a flag add to indicate no compression would take up a bit! Frencheigh 03:45, 19 Feb 2005 (UTC)
  • forget the "add a flag" idea. i am not adding anything. i'm claiming that a compression algorithm could make any file smaller or equal to its original size without making a larger file. let's put numbers in for the variables: i have a 10 bit file. someone claims they have an algorithm that can compress any file in a lossless fashion. but we reject this because the number of 10 bit files can't be mapped uniquely to even 9 bit files (N-1). but certainly there are enough if you take into consideration all 10 bit files as well, since all 10 bit files would have to be able to be mapped uniquely to themselves. again how does it follow that some files must be larger? --gb 03:27, Feb 20, 2005 (UTC)
  • Your argument is fine as long as we consider an algorithm that only compresses fixed length files (10 bits in your example). I mean no longer and no shorter files. Let's consider an algorithm that encodes all files no longer than 10 bits losslessly. Lets assume it shrinks only one 10bit file to 9bit and leaves all others at the length of 10. Now consider 9bit files. The algorithm used one 9bit file to represent one 10bit file, so we must find another representation for that file. We can do this by either assigning it a 10bit file (but we don't want to increse length ;_) or assign it an 8bit file, leaving the others 9bit files untouched. The same can be repeated with the 8bit and shorter files, until we step down to the 1bit case. We've used one file (say 0) to represent one 2bit sequence. But we still have two others to encode. So we can use the remaining 1 to encode itself and we must come up with the way to encode 0. The option is to represent it as 0 length file (not very practical ;_) or a remaining 10bit file. Even if we admit 0bit file to our scheme than it breakes down if we allow our algorithm to be better by shrinking more then one file. So you must use some longer file to represent the files remaining at the 1bit step. Therefore every practical algorithm will have to enlarge some files. So now we're talking semantics: are the lossless compression algorithms methods of effectively shrinking files or are they just some abstract mappings from n-bit files to no longer then n files. My thought is that the article deals with the former. --filu 17:55, 9 Mar 2005 (UTC)


  • The statement "Lossless data compression makes some files longer" is misleading. Sure using the common compression algorithms may make a file with high entropy longer but there are algorithms (some fairly crude) which get round this and don't do anything to data which it would expand. They don't compress this data but they certainly don't make it longer. Would it be better to change the title to indicate that its the lossless compression algorithms which make the files longer? After all if you make a file longer you aren't compressing it. --194.66.95.100
  • At a minimum, there needs to be a way to indicate whether the data has been kept in its original form or has been altered by applying the "compression" algorithm. That indicator might be as simple and as small as adding a one-bit header flag. However, that would mean expanding some class of "unusual" files by one bit in order to add such a header flag, and thus it would mean making some files longer. Unless it is known that the file already has some special properties (e.g., that it contains a header in which there is some unused bit that can be used for such a flag), in which case we're not really talking about files of arbitrary content anymore, then some files must be expanded (at least by some tiny amount) if any of them are to be compresed. --Pawnbroker 23:37, 17 August 2005 (UTC)[reply]
Even if I could be bothered to check the math, it's not a discussion whose significance warrants place it first, giving it a major headline, and getting into such detail. In practice, you're not going to run into this that often - except if trying to compress something a second time. That section needs to be trimmed and demoted to put it into perspective. flux.books 14:39, 17 January 2006 (UTC)[reply]
Attempting to compress data multiple times is extremely common. The second attempt just happens behind-the-scenes during data archival (tgz) or data transfer (TCP compression) of previously compressed data. — Preceding unsigned comment added by 204.9.220.36 (talk) 22:48, 23 March 2012 (UTC)[reply]

Which Algorithm is better

Which algorithm is better for compressing the text or document file. Iam hanged in search of good algorithm?

By: Pradeep (pradeepgs_cta@yahoo.com)

Short answer: http://corpus.canterbury.ac.nz/details/cantrbry/RatioByRatio.html shows the current world-record best compression program (and a bunch of runners-up) on the "Canterbury Corpus", a collection of text and document files.

Long answer: No one algorithm is "best" for every document. Good compressors implement several different algorithms, automatically switching to the algorithm that gives the best compression. Also, a few of the algorithms are a combination of several other algorithms.

Most people recommend learning about LZ77 (algorithm) first, then Huffman coding.

Have you seen the list of algorithms at Category:Lossless compression algorithms ?

"input distributions"

The article currently claims Good compression algorithms are those that achieve shorter output on input distributions that occur in real-world data.

Not all compression algorithms rely on "input distributions". If we consider 256-pixel grayscale images where every pixel is a different color (they all have identically the same input distribution), PNG can compress some of them but (I suspect) not all of them.

I'm going to change that sentence to (hopefully) be more accurate.


compressor that never makes files longer

moved from article

However, it is theoretically possible to have a compressor that does not make files larger, simply by evaluating a test loop in whether the compression method is larger than the uncompressed file; if it is, it does not compress the file at all, and therefore there is not the particular set of bytes at the beginning of the file identifying that it is compressed.

Many people believe this is true. What could we put in the article to make it clear and obvious that it is not true? There's a mathematical proof already in the article, but that apparently is not clear or obvious enough. --DavidCary 17:27, 23 December 2005 (UTC)[reply]

Some discussion on this has been going on already (see above), but it's clear it's not obvious why it doesn't work as advertised. If you don't understand the mathematical argument or don't apply it correctly you will likely not accept its conclusion.
There are two different issues people grapple with. The first is the idea that by adding the compression marker out-of-band (with an "unused bit", or even simpler, just printing whether it was compressed or not and having the user memorize it), we can avoid the file becoming larger. This is true. How can this be possible in light of the mathematical proof of impossibility? Simple: we have shifted definitions. The mathematical proof considers the total size of the input and the total size of the output. An "unused bit" is by definition not part of the input (even though it takes up storage space), but is part of the output. Someone memorizing whether the file was compressed or not is not part of the input, but is part of the output. Even though the file does not get longer, the size of the compressed output is larger than the size of the uncompressed input. We are cheating by selectively omitting part of the output.
The second major issue is the one illustrated by your quote: the idea that if we just produce the original file if the compressed output would be larger than the uncompressed input, we will always get output that is at most the size of the input. Second shocker: this is also true. Why does this fail on a mathematical level?
Recall that we call an algorithm lossless precisely when we can recover the original file from the compressed file without loss of information. Now let's do something very simple: take a file and compress it until it compresses no more. (This will always happen, usually right at the second try.) The very last compression step is problematic, because it simply returns the input. If we decompress the resulting output, we do not get back our original input! Instead we get the result of the next-to-last compression step. In other words, these algorithms do not produce unique output files for every input file, and this contradicts a basic assumption in the mathematical proof. Such algorithms are, in fact, lossy, if not in the traditional sense.
For a practical illustration of the problem, imagine compressing all files in a directory individually, then decompressing them in the same directory. Things go wrong if one of these files is a compressed file itself: the compression does not change it, so the decompression will produce a file different from the original. This breaks our basic promise of a lossless compression. The only way to prevent this from happening is by somehow adding the information that the file must not be modified on decompression—but this will make the output larger.
The above does not sound very professional, I'm sad to say, but I think it does explain where intuition fails, and maybe someone can use it to write an explanatory text. Or maybe someone can supply a good reference from a book on compression, I'm positive that something like this must be written down in a book somewhere, since these are common misconceptions that should be corrected. JRM · Talk 04:31, 26 December 2005 (UTC)[reply]
Is the fact that lossless compression algorithms must necessarily increase size for certain inputs really worth being the first section after the summary? It seems more like material for an interesting footnote or a separate entry on the mathematics of lossless algorithms. The combination of the dense mathmatics and the tangentialness to the rest of the content seems jarring -- Dave
I agree with the tangentialness, although there's no particular reason the mathematics couldn't appear. It's probably more productive to replace this with a simple external reference, though; we don't need to repeat proofs for the heck of it, as Wikipedia is not a primary source. Anyone got a book with this proof in it handy? JRM · Talk 16:40, 29 December 2005 (UTC)[reply]
Alright, I see what you're saying and why you moved my statement. Looking at it from a purely mathematical perspective, leaving the file unchanged is technically cheating. However, looking at it in a practical sense, it would work the majority of the time. The chances of a file not being compressable (by a specific algorithm) is fairly low, considering algorithms use the majority of human-produced data files as a reference to how they are created, therefore they have a much greater chance of compressing an average file, not a mathematical outlier. If the file remains "uncompressed" to prevent it from gaining file size, then there is a 1/255^3 chance of the decompressor not detecting it, assuming we use a 3-byte referencing code at the beginning of the file to identify it as compressed. In the world of programming, there is a much more likely chance of some sort of other error (such as a bug in the decompressor program or a transfer error) than 1/255^3, and in a practical sense, many people would be willing to take the risk, with the added benifit that if the file does not decompress correctly, it must already be uncompressed. Xcelerate · Talk 17:34, 4 January 2006
Yes, you're quite right. In fact, people do use this technique in practice. It is probably as important to see why this works most of the time as to realize why it doesn't really work from a mathematical perspective.
In effect, this creates an algorithm that's wrong with an arbitrarily small (but non-zero) probability for most files, and is guaranteed to be wrong when applied to already compressed files (assuming we can't compress these further), as mentioned above. In practice these problems will hardly matter, but it does mean the algorithm is strictly speaking lossy.
Almost all "serious" compressors compress anyway, though, even if the output is larger. (Archivers like ZIP may instead use a separate "not compressed" bit in their indices, or "compression method: store".) It simplifies the algorithms; most compression algorithms have very little overhead for totally uncompressable input data, so it doesn't pay off to make the compressor more complicated by checking the output size (and it avoids the problem of feeding already compressed data to the compressor, which we can't handle properly). JRM · Talk 02:19, 5 January 2006 (UTC)[reply]

LZMA compression

LZMA is an improved version of LZ77 so it should be included in this article. right?

It is now (you didn't sign you comment so I have no clue when you wrote it).Nicolas Barbier (talk) 10:30, 29 June 2011 (UTC)[reply]

Including an intuitive proof of the weaker theorem

Should we include a short proof for the weaker theorem: "No lossless data compression method reduces the length of all files."?

The weaker form, which does not address the necessity of making some files longer, can be proven in the following way: Suppose that some compression method existed that reduced the size of every input file. Then a user could apply the method repeatedly to reduce any file to length zero.

24.17.254.19 23:43, 24 December 2006 (UTC) goldena[reply]

This weaker version holds even without knowing anything about what lossless compression means: No total mapping from bit (byte) sequences to bit (byte) sequences can produce an output that is shorter than its input for all inputs, simply because in the case of the zero-length input there is no possible shorter output to produce. A property that obvious would probably not do much to improve the article. It is more common for proponents of miraculous compression methods to make the more guarded claim that their method will shrink all files larger than some minimum length; and refuting that leads one to the same kind of counting argument that is already in the article. Henning Makholm 00:26, 25 December 2006 (UTC)[reply]

LZW Compression:

LZW compression is best(obvious) choice while compressing text documents.It is more of dictionary based approach in which it adds new word found into its database and uses index for that word.Even though it will not give extremely good compression ratios for complex documents.It is fairly efficient for general documents. —The preceding unsigned comment was added by 202.56.254.194 (talk) 10:25, 28 February 2007 (UTC).[reply]

No, LZW is not the best. There are algorithms that give better compression ratios and/or decompression speeds. LZW hasn't been the best for a long time. — Preceding unsigned comment added by 107.77.218.83 (talk) 08:22, 2 May 2016 (UTC)[reply]

Compressive File Types

Question: Should BMP be listed as part of lossless graphics saving methods. —Preceding unsigned comment added by 193.64.57.253 (talk) 2007-04-13T07:44:30

No, becapse BMP is not a compression method. The pixel data in a BMP image may be stored with run-length encoding or (in recent variants of the format) other compression schemes, but even in those cases "BMP" itself does not denote compression. –Henning Makholm 14:12, 15 April 2007 (UTC)[reply]


Another Question: Although this page describes the compressive file types "*.fileextension" (and hence the compression methods) of Graphics and Audio, Video (and it's associated links) all talk about codecs. I can understand pointing people to try to read about codecs -- however, please note that this page is one where people look for making comparisons when deciding upon a favourite lossless file type to catalogue their media. For example: I have a camcorder which records onto a small tape. It came with a driver and some editing software on a CD which creates only large AVI files. When researching for information on what I have (and what I should be looking for), I feel that people will be trying to push certain products onto me to try out despite the fact that I will be forced to trust that it's really what I want. Could someone please expand the Video section to include better reference points on the kinds of lossless video (recording and playing) before jumping straight into codecs?203.206.244.127 (talk) 00:38, 10 September 2009 (UTC)[reply]

TRULY lossless?

i've heard the argument that there is no such thing as a TRULY lossless format, that bits a pieces of data get lost with every reformat, even if it is lossless->lossless reformat. can someone clue me in on this and tell me what data is lost? --AlexOvShaolin 03:03, 5 July 2007 (UTC)[reply]

Don't believe everything you hear. Lossless means lossless. If the decompressed data is not exactly identical to the original, the compression is not lossless. –Henning Makholm 22:57, 5 July 2007 (UTC)[reply]
Maybe what you have heard is that not any piece of data can be compressed losslessly. Some data must map to itself when a compression algorithm is applied. Think of compressing the following sequence of bits: 01. That's just two bits. To compress it, the result would have to be shorter than the original version: just 1 or 0. Say, 01 is compressed to 0. Here comes the problem. We can now compress, say, 10 into 1, but how to compress 00 or 11? To 0? No, that's 01 compressed. To 1? No, that's 10 compressed! So they are mapped to themselves. This generalizes to large files as well. --ZeroOne (talk | @) 22:04, 6 July 2007 (UTC)[reply]
OIC, so once its in a lossless format, it can reformat to a lossless format without loosing data, even during compression, as long as the compression is a lossless process. thanx for sheding light. --AlexOvShaolin 00:51, 7 July 2007 (UTC)[reply]
Alex might be right in the following sense: If we take a video file in lossless format A, recompress it in format B, and recompress it in format A, the resulting file will probably not be equal bit-by-bit to the original one. At least some of the metadata will probably have changed. Hence some information has been lost (and be it only the alignment of some internal fields of the original file). The important thing is that no *useful* information has been lost, i.e., the video data itself is unmodified. The reason why this is still called a lossless compression algorithm is that the compression algorithm is applied to the video itself, and on the video itself, the operation is indeed lossless. Dominique Unruh (talk) 11:31, 30 December 2008 (UTC)[reply]

Magic compression algorithm

Recently an article named Magic compression algorithm was created. I submit that it would make better sense to have its content here, as part of the some-files-must-become longer discussion. Alternatively, that discussion could be moved to magic compression algorithm and just a cross-reference left here. Opinions? –Henning Makholm 20:38, 13 September 2007 (UTC)[reply]

Personally, I think it should have a one or two paragraph mention here and a 'see main article' pointing to it - there is quite a lot of content there Modest Genius talk 19:26, 17 September 2007 (UTC)[reply]
I believe that it makes more sense to have Magic compression algorithm as a separate article linked from here through {{main}} or {{seealso}}. This way, it can discuss the psychological aspects properly -- in the context of magic compression, these are of great interest, but in Lossless data compression, they would be a minor footnote at best. (Disclosure: I was the one to create Magic compression algorithm, and to WP:DYK it.)
As for duplication, this particular theorem is reasonably simple, fits into a single paragraph and can be summarised in a single sentence. I see the duplication's problematicity as relatively minor. Certainly, WP:POVFORK can't apply to mathematics. ΔιγυρενΕμπροσ! 10:01, 12 October 2007 (UTC)[reply]

Best algorithm algorithm

We see

some individuals to conclude that a well- designed

compression algorithm can compress any input, thus, constituting a magic compression algorithm.

My algorithm is to use several algorithms and pick the best (and noting in the header produced which algorithm we chose, for easy uncompression) ---so there! Yes, can't always compress, but when it does it is the best algorithm on the block. Do mention please! Jidanni (talk) 20:05, 9 June 2008 (UTC)[reply]

You can always encode some data into a shorter representation; however, unless you reverse the transformation (decode), it isn't a lossless data compression algorithm. This most common case of this is a Hash function. —Sladen (talk) 22:15, 10 June 2008 (UTC)[reply]

Lossless compression of arbitrary data set is possible with a lower boundary in place

Okay, i'll post this here since the main page contribution keeps getting nuked by moderators whom seem to think they are up to speed, without them presenting any counter argument to the lower bounded case.

So, i'll state the relationship that allows for any of these algorithms to function:

where the entire stream T is a single digit in base V. where I is the index of the compressible stream segment found when expanding T into multiple digits in various bases K = (V-1) -> 2. J = ln(logK T). as T increases, there are more bases K which are inherit to mathematics being able to function on T.

Thus, the amount of possible ways of looking at the information increases linearly from T increasing, while the bit size J increases logarithmically from T increasing.

There are lower boundaries on the minimal threshold size required to perform these algorithms, the pidgeonhole principle never takes effect since there's never an attempt to remap 1 to 10 and 11 simultaneously. the counting argument, effectively stating the same thing as the pidgeonhole principle, also never comes about due to the lower boundaries.

An easy way to conceive of the algorithms that function on top of this platform is to consider that the usual dictionaries and algorithms can become more, much more efficient when they are already part of the representation inherit to mathematics. That is, bases, temporal coherency in quantization representation.

Anyone whom wants an honest discussion of what can be done *after* the threshold is reached would be good company. Personally, I've done enough research to come to the conclusion that there is a very valid school of thought that matter and information are one and the same, and there are indeed ways of shaping informational black holes, if not stars. I believe this phenomenon is described above, and there is a maximal classical entropy that changes the playing field once reached.

Chris. UmbraPanda 03:36, 12 October 2007 (UTC)[reply]

You know what? It's stuff like this why the WP:NOR principle was invented. But I think it's dishonest to euphemise this nonsense as research.
You do not understand what you're talking about. You make attempts to mimic mathematical texts, but your lack of understanding of the subject at hand makes it impossible for you to come to reasonable conclusions. You use impressive-sounding phrases like "lower boundaries on the minimal threshold size", but in the end, your ideas amount to getting bedazzled when looking at many bits together, and no amount of technobabble is going to change that.
Your continued emphasis on "usual dictionaries" is utterly pointless, as the theorem does not depend on any sort of dictionaries, usual or otherwise. No "quantization" comes into play. No "temporal coherency" is involved.
Your allusions about "dishonesty" and implications of a suppressive conspiracy out to get you are particularly ridiculous. For Euler's sake, it's mathematics we're talking about!
All that having been said, if there's any part of the theorem you want to be explained in further detail, you're welcome to ask. But don't post this rubbish of yours again -- especially in the article space. ΔιγυρενΕμπροσ! 10:41, 12 October 2007 (UTC)[reply]
I understand that you have personal problems dealing with the concept, but there's no need to be hostile. Once again, having actually posted no counter arguments unfortunately makes your position weaker rather than stronger, as i would assume (and it is an assumption) you wanted to portray. So, drop the ego, no-one but you have mentiond dishonesty or conspiracy. Consider in full what i'm describing before jumping about and making funny faces.
The 'temporal coherency' is simply the fact that there is, indeed, a coherency in the order of the information. 103010 is not the same as 300011. You agree with this, we read it, and the information is processed by any computational device, human mind or electronic, as as a stream of digits that is held together, in sequence. You may consider this spacial coherency instead of temporal. Perhaps 'single dimensional' coherency is most accurate. It's super basic. You need to get right down to the representation to get the next parts. If it's too obvious to you, reconsider how much you're taking for granted.
The actual important part inside of the article deals with the fact that compressors are better for specific types of information. Text compressors usually deal poorly with sound, and vise-versa. The arrangement that allows for the method that i'm presenting takes this and runs with it. At the base of all the lossless compression algorithms is the straight up fact that they are looking for the greatest pattern density. This is the dictionary. Again, super obvious.
You, like others, are not being able to deal with the larger sets. The only way that the relationships would be maintained for the lowest cases is if the symmetries that are present in the pattern densities are carried forth, constantly, for every single representation of a given string. This is obviously not the case, for the very same process.. there's nothing simpler than binary in terms of laying out data in a single dimension. And yet, change that binary string to a different base, and what happens? It is expressed in a different format, with new possibilities of structural cohesion, when looking at the information, in that base, in a single dimension.
This is obvious. What comes from this basic fact, that we are looking at a non-quantized pool that has the capability to express rapidly varying pattern densities (read that again if you're getting confused, play with some examples if you need to. use 10000+ digit length binary strings.) with the straight up fact that as the pool increases, there are more potential shifts to different pattern densities. Probabilities increase of a case appearing that has an exponentially smaller overall range than the general base range. Or you can look at it as more 'words' appearing of closer sizes. Or larger patterns in the sound bank.
The goal of course is to strip as many abstractions as possible off, so the algorithm can use the easiest way to deal with the arbitrary data, since with the arbitrary form you can fold the information into the larger pool once again, since it changes the overall pool, hence changes the patterns that result from quantization into different bases.
This is exactly where I derive the term 'lower bounded', since this effect doesn't happen with lower bit sizes. Indeed, if we were restricted to only ever conceiving of anything in binary, this wouldn't work. But we aren't, so it does.
I can understand that to you, a mathematican not practiced in this field, this may be difficult to comprehend, as you indeed have said you are getting 'bedazzled' when trying to look at that many bits at once. So I say to not look at it as bits. Look at it as a shifting base-line.
Hopefully, you should be able to at this point percieve of what i'm describing in full force. Interesting things await, the full algorithms I've developed wouldn't be useful for the current computer age, as while the decompression is extremely fast, the comparisons between the range levels need a very large multi-parallel processor to achive. Mine works, it just isn't practical in this time. I think there are a lot of smart people out there that can utilize this, if they aren't blocked by near-sighted naysayers that have no real idea of what is going on. No conspiracy, no dishonesty, just, as you term yourself 'bedazzled when looking at that many bits together.'
Chris. UmbraPanda 05:27, 16 October 2007 (UTC)[reply]

Mulitmedia Section

A big portion of the multimedia section had to do with wavelet transforms and JPEG2000. JPEG2000 is a lossy compression format, so I removed that portion. I also took out any reference to wavelet based methods, since it was referenced to the JPEG2000 format. If someone can provide a good referenced explanation of wavelet based lossless methods that should definitely be added, but note that the wavelet compression page does not have any explanation. Bkessler (talk) 16:31, 11 August 2008 (UTC)[reply]

JPEG2000 actually has both lossy and lossless modes; but most of its details still ought to be confined to the article on JPEG2000 itself (and likewise for wavelet transforms in general). But this is an old comment so I don't know if you'll respond. Dcoetzee 23:33, 9 February 2009 (UTC)[reply]

universal data compression

I suggest merging universal data compression into the lossless_data_compression#Limitations section. --76.209.28.222 (talk) 22:35, 9 February 2009 (UTC)[reply]

I agree, especially since the current article does not provide sufficient context. However, it should be careful to clarify that universal data compression is not at all related to universal codes. Dcoetzee 22:46, 9 February 2009 (UTC)[reply]
I agree with merger too, but it's been a year now! Pcap ping 21:09, 23 April 2010 (UTC)[reply]
Finally added the correct merge tags (from a comment at WT:MATH, which suggests improving, rather than merging.) — Arthur Rubin (talk) 20:08, 24 June 2010 (UTC)[reply]
I have deleted those merge tags, since universal data compression has been deleted 19:43, 9 August 2010 by User:Black Kite. —H.Marxen (talk) 04:30, 5 September 2010 (UTC)[reply]

Lossless compression with GIF

Ok, sorry, I accidentally said that GIF was lossless, then I reverted the change. But in reality, GIF is lossless if the colorspace is limited, right? I think it would be better to add it back to the list of lossless compressions with a noteL

  • GIF(only when there are 256 or less colors).

Then, idiots like me wouldn't make my mistake sometime in the future ... Currently, GIF is not on the lossless OR lossy compression lists on wikipedia, so what is it ?? I believe that it's either one, the other, or BOTH, but it can't be NEITHER !! SystemBuilder (talk) 23:24, 4 June 2010 (UTC)[reply]

GIF isn't a compression algorithm. Its a graphics file format, which can use RLE as its compression algorithm. --207.138.32.130 (talk) 00:30, 8 September 2010 (UTC)[reply]
You are entirely right, except for the fact that it uses LZW, not RLE, as its compression algorithm. E.g., see the Burn All GIFs campaign. Nicolas Barbier (talk) 10:37, 29 June 2011 (UTC)[reply]

random data cannot be compressed

Could someone repair the formulation files of random data cannot be consistently compressed by any conceivable lossless data compression algorithm into something more exact? Even the formulation random bits is too short to be the right one. Thank you. —Preceding unsigned comment added by 90.177.52.52 (talk) 20:31, 13 March 2011 (UTC)[reply]

What is the inexactness that bothers you? It looks fine to me. –Henning Makholm (talk) 20:42, 13 March 2011 (UTC)[reply]
For a subsection on Lossless Data Compression, "random data" is too ambiguous. If you saved the output of a-million-monkeys-typing-Shakespeare as ASCII text files, that would be universally compressible. If you saved the bitstream from a cryptography-strength random generator, then no, it would not be universally compressible. Both sources are "random data", but with different entropy. — Preceding unsigned comment added by 204.9.220.36 (talk) 19:21, 30 March 2012 (UTC)[reply]

Audio formats that don't correspond to a lossless compression algorithm

It seems that the list currently contains the following, which do not correspond to a lossless compression algorithm:

  • Waveform audio format - WAV: WAV is a container format. It typically stores PCM data, which is not compressed, or—if the reduction in bitrate is considered compression—not in a lossless way. The other compression methods that can be used inside a WAV file have nothing to do with "WAV" in itself.
  • PCM: See above, PCM is not a compression algorithm, or at least not a lossless one.
  • LPCM: As LPCM is a more specific version of PCM, the above argument also counts.

I am therefore removing these from the list.Nicolas Barbier (talk) 08:47, 29 June 2011 (UTC)[reply]

Multiple issues tag

CFCF (talk · contribs) recently tagged this article for multiple issues (https://en.wikipedia.org/enwiki/w/index.php?title=Lossless_compression&diff=prev&oldid=645299565), citing "Lists include many unsourced claims of 'near lossless formats', inherently subjective statements". I'm only seeing one such claim (for JPEG-LS), and the linked article is well-referenced. "Near lossless" in this sense is not subjective: it refers to setting a predetermined difference threshold (between the pixel value before and after compression) that is not exceeded by the compression. JPEG-LS allows such a threshold (even 0 for truly lossless compression) to be specified as part of the compression. -- Elphion (talk) 19:13, 3 February 2015 (UTC)[reply]

Hearing no response, I've removed the tag. -- Elphion (talk) 23:09, 10 February 2015 (UTC)[reply]

Normal number

A normal number cannot be compressed. In the Wikipedia article that fact is used as an alternate definition of a normal number. (By the way, that shows that English is not a normal number.) This information should probably be integrated into this article. I don't have any time to do it. I have added the 'See also' link, but that is not enough. agb — Preceding unsigned comment added by 173.233.167.122 (talk) 23:11, 10 February 2015 (UTC)[reply]

It's of interest at Normal number. It's not of interest here. (And it does not show that "English" is not a normal number; English and real numbers are not in the same category.) -- Elphion (talk) 23:34, 10 February 2015 (UTC)[reply]

Hello fellow Wikipedians,

I have just modified one external link on Lossless compression. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 5 June 2024).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 05:18, 26 May 2017 (UTC)[reply]

Limitation section

@Ho33e5: I reverted the removal of the CN tag for the claim that "In fact, if we consider files of length N, if all files were equally probable, then for any lossless compression that reduces the size of some file, the expected length of a compressed file (averaged over all possible files of length N) must necessarily be greater than N." Is this N the same as the N in the section above? If so, it's not clear that this N is particularly important -- its behavior may perhaps be a small irregularity. If not, then it's not clear that the claim is true: it's easy to devise codes for the 2^N strings of length N s.t. the expected value of the lengths of the encoded strings is < N. In short, it's not clear how this claim relates to the rest of the section. I think the whole section would profit from a rewrite. -- Elphion (talk) 07:28, 11 December 2017 (UTC)[reply]

Hello fellow Wikipedians,

I have just modified 3 external links on Lossless compression. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:

When you have finished reviewing my changes, you may follow the instructions on the template below to fix any issues with the URLs.

This message was posted before February 2018. After February 2018, "External links modified" talk page sections are no longer generated or monitored by InternetArchiveBot. No special action is required regarding these talk page notices, other than regular verification using the archive tool instructions below. Editors have permission to delete these "External links modified" talk page sections if they want to de-clutter talk pages, but see the RfC before doing mass systematic removals. This message is updated dynamically through the template {{source check}} (last update: 5 June 2024).

  • If you have discovered URLs which were erroneously considered dead by the bot, you can report them with this tool.
  • If you found an error with any archives or the URLs themselves, you can fix them with this tool.

Cheers.—InternetArchiveBot (Report bug) 13:45, 6 January 2018 (UTC)[reply]

The subsection Historical legal issues only mentions legal issues in the first paragraph, then goes on to talk about other compression topics. Compare with e.g. this article. Sjgallagher2 (talk) 14:33, 6 March 2023 (UTC)[reply]

  1. ^ "complex_file[SHA256=78de9c23f28100f3fb711770f8278ed4da49995b33cfc0456ee88d781fab0bb5].bin". Retrieved 2024-10-21. {{cite web}}: |first= missing |last= (help)CS1 maint: numeric names: authors list (link)