LZ77 and LZ78
LZ77 and LZ78 are the names for the two lossless data compression algorithms published in papers by Abraham Lempel and Jacob Ziv in 1977 and 1978. These two algorithms form the basis for most of the LZ variations including LZW, LZSS and others. They are both dictionary coders, unlike minimum redundancy coders or run length coders. LZ77 is the "sliding window" compression algorithm, which was later shown to be equivalent to the explicit dictionary technique first given in LZ78.
The LZ77 algorithm works by keeping a history window of the most recently seen data and comparing the current data being encoded with the data in the history window. What is actually placed into the compressed stream are references to the position in the history window, and the length of the match. If a match cannot be found the character itself is simply encoded into the stream after being flagged as a literal. As of 2004, the most popular LZ77 based compression method is called DEFLATE; it combines LZ77 with Huffman coding.
While the LZ77 algorithm works on past data, the LZ78 algorithm attempts to work on future data. It does this by forward scanning the input buffer and matching it against a dictionary it maintains. It will scan into the buffer until it cannot find a match in the dictionary. At this point it will output the location of the word in the dictionary, if one is available, the match length and the character that caused a match failure. The resulting word is then added to the dictionary.
LZ78 never became as popular as LZ77, possibly because for the first few decades after it was introduced, parts of LZ78 were patent encumbered in the United States, while LZ77 is not patented. The most popular form of LZ78 compression was the LZW algorithm, a modification of the LZ78 algorithm made by Terry Welch.
References
- Abraham Lempel, Jacob Ziv; A Universal Algorithm for Sequential Data Compression, IEEE Transactions on Information Theory, May 1977.
- Abraham Lempel, Jacob Ziv; Compression of Individual Sequences Via Variable-Rate Coding, IEEE Transactions on Information Theory, September 1978.