Byte pair encoding
Byte pair encoding is a simple form of data compression in which the two most common bytes of data are replaced with a byte that does not occur within the data. A table of the replacements is required to rebuild the original data.
Byte pair encoding example
Suppose we wanted to encode the data
aaabaaabac
The byte pair "aa" occurs most often, so it will be replaced by a byte that is not used in the data, "Z". Now we have a replacement table of
Z <- aa
and the data is
ZabZabac
In this data the byte pair "Za" occurs most often, so it will be replaced with a byte that is not used in the data, "Y". (In this case "Za" could be replaced by "Z", since every occurance of "Z" will be replaced.) The replacement table and data are now
Z <- aa Y <- Za
YbYbac
Once again we replace the byte pair that occurs most often.
Z <- aa Y <- Za X <- Yb
XXac
This data cannot be compressed further by byte pair encoding because there are no pairs of bytes that occur more than once.
To decompressed the data, simply perform the replacements in reverse order.