Byte pair encoding: Difference between revisions
MindlessXD (talk | contribs) mNo edit summary |
m added links |
||
Line 1: | Line 1: | ||
'''Byte pair encoding''' is a simple form of data compression in which the most common pair of consecutive |
'''Byte pair encoding''' is a simple form of [[data compression]] in which the most common pair of consecutive [[byte]]s of data is replaced with a byte that does not occur within that data. A table of the replacements is required to rebuild the original data. |
||
==Byte pair encoding example== |
==Byte pair encoding example== |
Revision as of 23:33, 14 July 2006
Byte pair encoding is a simple form of data compression in which the most common pair of consecutive bytes of data is replaced with a byte that does not occur within that data. A table of the replacements is required to rebuild the original data.
Byte pair encoding example
Suppose we wanted to encode the data
aaabaaabac
The byte pair "aa" occurs most often, so it will be replaced by a byte that is not used in the data, "Z". Now we have a replacement table of
Z <- aa
and the data is
ZabZabac
In this data the byte pair "Za" occurs most often, so it will be replaced with a byte that is not used in the data, "Y". (In this case "Za" could be replaced by "Z", since every occurance of "Z" will be replaced.) The replacement table and data are now
Z <- aa Y <- Za
YbYbac
Once again we replace the byte pair that occurs most often.
Z <- aa Y <- Za X <- Yb
XXac
This data cannot be compressed further by byte pair encoding because there are no pairs of bytes that occur more than once.
To decompressed the data, simply perform the replacements in reverse order.