Jump to content

Shannon–Fano coding: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
No edit summary
m References: add WP:TEMPLATECAT to remove from template; genfixes
 
(72 intermediate revisions by 54 users not shown)
Line 1: Line 1:
{{Short description|Data compression algorithms}}
In the field of [[data compression]], '''Shannon–Fano coding''', named after [[Claude Shannon]] and [[Robert Fano]], is a technique for constructing a [[prefix code]] based on a set of symbols and their probabilities (estimated or measured). It is [[Optimization (mathematics)|suboptimal]] in the sense that it does not achieve the lowest possible expected code word length like [[Huffman coding]]; however unlike Huffman coding, it does guarantee that all code word lengths are within one bit of their theoretical ideal <math>{-\log} P(x)</math>. The technique was proposed in Shannon's "[[A Mathematical Theory of Communication]]", his 1948 article introducing the field of [[information theory]]. The method was attributed to Fano, who later published it as a [[technical report]].<ref>{{harvnb|Fano|1949}}</ref>
In the field of [[data compression]], '''Shannon–Fano coding''', named after [[Claude Shannon]] and [[Robert Fano]], is one of two related techniques for constructing a [[prefix code]] based on a set of symbols and their probabilities (estimated or measured).
Shannon–Fano coding should not be confused with [[Shannon coding]], the coding method used to prove [[Shannon's noiseless coding theorem]], or with [[Shannon–Fano–Elias coding]] (also known as Elias coding), the precursor to [[arithmetic coding]].


* '''Shannon's method''' chooses a prefix code where a source symbol <math>i</math> is given the codeword length <math>l_i = \lceil - \log_2 p_i\rceil</math>. One common way of choosing the codewords uses the binary expansion of the cumulative probabilities. This method was proposed in Shannon's "[[A Mathematical Theory of Communication]]" (1948), his article introducing the field of [[information theory]].
In Shannon–Fano coding, the symbols are arranged in order from hhmost probable to least probable, and then divided into two sets whose total probabilities are as close as possible to being equal. All symbols then have the first digits of their codes assigned; symbols in the first set receive "0" and symbols in the second set receive "1". As long as any sets with more than one member remain, the same process is repeated on those sets, to determine successive digits of their codes. When a set has been reduced to one symbol, of course, this means the symbol's code is complete and will not form the prefix of any other symbol's code.
* '''Fano's method''' divides the source symbols into two sets ("0" and "1") with probabilities as close to 1/2 as possible. Then those sets are themselves divided in two, and so on, until each set contains only one symbol. The codeword for that symbol is the string of "0"s and "1"s that records which half of the divides it fell on. This method was proposed in a later (in print) [[technical report]] by Fano (1949).


Shannon–Fano codes are [[Optimization (mathematics)|suboptimal]] in the sense that they do not always achieve the lowest possible expected codeword length, as [[Huffman coding]] does.<ref name="Kaur">{{cite journal |last1=Kaur |first1=Sandeep |last2=Singh |first2=Sukhjeet |title=Entropy Coding and Different Coding Techniques |journal=Journal of Network Communications and Emerging Technologies |date=May 2016 |volume=6 |issue=5 |page=5 |s2cid=212439287 |url=https://pdfs.semanticscholar.org/4253/7898a836d0384c6689a3c098b823309ab723.pdf |archive-url=https://web.archive.org/web/20191203151816/https://pdfs.semanticscholar.org/4253/7898a836d0384c6689a3c098b823309ab723.pdf |url-status=dead |archive-date=2019-12-03 |access-date=3 December 2019}}</ref> However, Shannon–Fano codes have an expected codeword length within 1 bit of optimal. Fano's method usually produces encoding with shorter expected lengths than Shannon's method. However, Shannon's method is easier to analyse theoretically.
The algorithm produces fairly efficient variable-length encodings; when the two smaller sets produced by a partitioning are in fact of equal probability, the one bit of information used to distinguish them is used most efficiently. Unfortunately, Shannon–Fano does not always produce optimal prefix codes; the set of probabilities {0.35, 0.17, 0.17, 0.16, 0.15} is an example of one that will be assigned non-optimal codes by Shannon–Fano coding.


Shannon–Fano coding should not be confused with [[Shannon–Fano–Elias coding]] (also known as Elias coding), the precursor to [[arithmetic coding]].
For this reason, Shannon–Fano is almost never used; [[Huffman coding]] is almost as computationally simple and produces prefix codes that always achieve the lowest expected code word length, under the constraints that each symbol is represented by a code formed of an integral number of bits. This is a constraint that is often unneeded, since the codes will be packed end-to-end in long sequences. If we consider groups of codes at a time, symbol-by-symbol Huffman coding is only optimal if the probabilities of the symbols are [[Statistical independence|independent]] and are some power of a half, i.e., <math>\textstyle \frac{1}{2^n}</math>. In most situations, [[arithmetic coding]] can produce greater overall compression than either Huffman or Shannon–Fano, since it can encode in fractional numbers of bits which more closely approximate the actual information content of the symbol. However, arithmetic coding has not superseded Huffman the way that Huffman supersedes Shannon–Fano, both because arithmetic coding is more computationally expensive and because it is covered by multiple patents.


==Naming==
Shannon–Fano coding is used in the <tt>IMPLODE</tt> compression method, which is part of the [[Zip (file format)|<tt>ZIP</tt> file format]].<ref name="appnote">{{cite web

Regarding the confusion in the two different codes being referred to by the same name, Krajči et al. write:<ref name="Kraj">Stanislav Krajči, Chin-Fu Liu, Ladislav Mikeš and Stefan M. Moser (2015), "Performance analysis of Fano coding", ''2015 IEEE International Symposium on Information Theory (ISIT)''.</ref>
<blockquote>
Around 1948, both Claude E. Shannon (1948) and Robert M. Fano (1949) independently proposed two different source coding algorithms for an efficient description of a discrete memoryless source. Unfortunately, in spite of being different, both schemes became known under the same name ''Shannon–Fano coding''.

There are several reasons for this mixup. For one thing, in the discussion of his coding scheme, Shannon mentions Fano’s scheme and calls it “substantially the same” (Shannon, 1948, p. 17 [reprint]).<ref>{{Cite book |url=https://archive.org/details/sim_att-technical-journal_1948-07_27_3/page/402/ |title=The Bell System Technical Journal 1948-07: Vol 27 Iss 3 |date=1948-07-01 |publisher=AT & T Bell Laboratories |pages=403 |language=en}}</ref> For another, both Shannon’s and Fano’s coding schemes are similar in the sense that they both are efficient, but ''suboptimal'' prefix-free coding schemes with a similar performance.
</blockquote>

Shannon's (1948) method, using predefined word lengths, is called '''Shannon–Fano coding''' by Cover and Thomas,<ref>Thomas M. Cover and Joy A. Thomas (2006), ''Elements of Information Theory'' (2nd ed.), Wiley–Interscience. "Historical Notes" to Chapter 5.</ref> Goldie and Pinch,<ref>Charles M. Goldie and Richard G. E. Pinch (1991), ''Communication Theory'', Cambridge University Press. Section 1.6.</ref> Jones and Jones,<ref>Gareth A. Jones and J. Mary Jones (2012), ''Information and Coding Theory'' (Springer). Section 3.4.</ref> and Han and Kobayashi.<ref>Te Sun Han and Kingo Kobayashi (2007), ''Mathematics of Information and Coding'', American Mathematical Society. Subsection 3.7.1.</ref> It is called '''Shannon coding''' by Yeung.<ref>Raymond W Yeung (2002), ''A First Course in Information Theory'', Springer. Subsection 3.2.2.</ref>

Fano's (1949) method, using binary division of probabilities, is called '''Shannon–Fano coding''' by Salomon<ref>David Salomon (2013), ''Data Compression: The Complete Reference'', Springer. Section 2.6.</ref> and Gupta.<ref>Prakash C. Gupta (2006), ''Data Communications and Computer Networks'', Phi Publishing. Subsection 1.11.5.</ref> It is called '''Fano coding''' by Krajči et al.<ref name="Kraj" />

==Shannon's code: predefined word lengths==

{{Main|Shannon coding}}

===Shannon's algorithm===

Shannon's method starts by deciding on the lengths of all the codewords, then picks a prefix code with those word lengths.

Given a source with probabilities <math>p_1, p_2, \dots, p_n</math> the desired codeword lengths are <math>l_i = \lceil -\log_2 p_i \rceil</math>. Here, <math>\lceil x \rceil</math> is the [[Floor and ceiling functions|ceiling function]], meaning the smallest integer greater than or equal to <math>x</math>.

Once the codeword lengths have been determined, we must choose the codewords themselves. One method is to pick codewords in order from most probable to least probable symbols, picking each codeword to be the lexicographically first word of the correct length that maintains the prefix-free property.

A second method makes use of cumulative probabilities. First, the probabilities are written in decreasing order <math>p_1 \geq p_2 \geq \cdots \geq p_n</math>. Then, the cumulative probabilities are defined as
:<math>c_1 = 0, \qquad c_i = \sum_{j=1}^{i-1} p_j \text{ for }i \geq 2 , </math>
so <math>c_1 = 0, c_2 = p_1, c_3 = p_1 + p_2</math> and so on.
The codeword for symbol <math>i</math> is chosen to be the first <math>l_i</math> binary digits in the [[binary number|binary expansion]] of <math>c_i</math>.

===Example===

This example shows the construction of a Shannon–Fano code for a small alphabet. There 5 different source symbols. Suppose 39 total symbols have been observed with the following frequencies, from which we can estimate the symbol probabilities.

:{| class="wikitable" style="text-align: center;"
! Symbol
! A
! B
! C
! D
! E
|-
! Count
| 15
| 7
| 6
| 6
| 5
|-
! Probabilities
| 0.385
| 0.179
| 0.154
| 0.154
| 0.128
|}

This source has [[Entropy (information theory)|entropy]] <math>H(X) = 2.186</math> bits.

For the Shannon–Fano code, we need to calculate the desired word lengths <math>l_i = \lceil -\log_2 p_i \rceil</math>.

:{| class="wikitable" style="text-align: center;"
! Symbol
! A
! B
! C
! D
! E
|-
! Probabilities
| 0.385
| 0.179
| 0.154
| 0.154
| 0.128
|-
! <math>-\log_2 p_i</math>
| 1.379
| 2.480
| 2.700
| 2.700
| 2.963
|-
! Word lengths <math>\lceil -\log_2 p_i \rceil</math>
| 2
| 3
| 3
| 3
| 3
|}

We can pick codewords in order, choosing the lexicographically first word of the correct length that maintains the prefix-free property. Clearly A gets the codeword 00. To maintain the prefix-free property, B's codeword may not start 00, so the lexicographically first available word of length 3 is 010. Continuing like this, we get the following code:

:{| class="wikitable" style="text-align: center;"
! Symbol
! A
! B
! C
! D
! E
|-
! Probabilities
| 0.385
| 0.179
| 0.154
| 0.154
| 0.128
|-
! Word lengths <math>\lceil -\log_2 p_i \rceil</math>
| 2
| 3
| 3
| 3
| 3
|-
! Codewords
| 00
| 010
| 011
| 100
| 101
|}

Alternatively, we can use the cumulative probability method.

:{| class="wikitable" style="text-align: center;"
! Symbol
! A
! B
! C
! D
! E
|-
! Probabilities
| 0.385
| 0.179
| 0.154
| 0.154
| 0.128
|-
! Cumulative probabilities
| 0.000
| 0.385
| 0.564
| 0.718
| 0.872
|-
! ...in binary
| 0.00000
| 0.01100
| 0.10010
| 0.10110
| 0.11011
|-
! Word lengths <math>\lceil -\log_2 p_i \rceil</math>
| 2
| 3
| 3
| 3
| 3
|-
! Codewords
| 00
| 011
| 100
| 101
| 110
|}

Note that although the codewords under the two methods are different, the word lengths are the same. We have lengths of 2 bits for A, and 3 bits for B, C, D and E, giving an average length of

:<math display="block">\frac{2\,\text{bits}\cdot(15) + 3\,\text{bits} \cdot (7+6+6+5)}{39\, \text{symbols}} \approx 2.62\,\text{bits per symbol,}</math>

which is within one bit of the entropy.

===Expected word length===

For Shannon's method, the word lengths satisfy

:<math>l_i = \lceil -\log_2 p_i \rceil \leq -\log_2 p_i + 1 .</math>

Hence the expected word length satisfies
:<math display="block">\mathbb E L = \sum_{i=1}^n p_il_i \leq \sum_{i=1}^n p_i (-\log_2 p_i + 1) = -\sum_{i=1}^n p_i \log_2 p_i + \sum_{i=1}^n p_i = H(X) + 1.</math>
Here, <math>H(X) = - \textstyle\sum_{i=1}^n p_i \log_2 p_i</math> is the [[Entropy (information theory)|entropy]], and [[Shannon's source coding theorem]] says that any code must have an average length of at least <math>H(X)</math>. Hence we see that the Shannon–Fano code is always within one bit of the optimal expected word length.

==Fano's code: binary splitting==

===Outline of Fano's code===

In Fano's method, the symbols are arranged in order from most probable to least probable, and then divided into two sets whose total probabilities are as close as possible to being equal. All symbols then have the first digits of their codes assigned; symbols in the first set receive "0" and symbols in the second set receive "1". As long as any sets with more than one member remain, the same process is repeated on those sets, to determine successive digits of their codes. When a set has been reduced to one symbol this means the symbol's code is complete and will not form the prefix of any other symbol's code.

The algorithm produces fairly efficient variable-length encodings; when the two smaller sets produced by a partitioning are in fact of equal probability, the one bit of information used to distinguish them is used most efficiently. Unfortunately, Shannon–Fano coding does not always produce optimal prefix codes; the set of probabilities {0.35, 0.17, 0.17, 0.16, 0.15} is an example of one that will be assigned non-optimal codes by Shannon–Fano coding.

Fano's version of Shannon–Fano coding is used in the <code>IMPLODE</code> compression method, which is part of the [[Zip (file format)|<code>ZIP</code> file format]].<ref name="appnote">{{cite web
| url = http://www.pkware.com/documents/casestudies/APPNOTE.TXT
| url = http://www.pkware.com/documents/casestudies/APPNOTE.TXT
| title = <tt>APPNOTE.TXT</tt> - .ZIP File Format Specification
| title = <code>APPNOTE.TXT</code> - .ZIP File Format Specification
| accessdate = 2008-01-06
| access-date = 2008-01-06
| publisher = PKWARE Inc
| publisher = PKWARE Inc
| date = 2007-09-28
| date = 2007-09-28
Line 17: Line 211:
}}</ref>
}}</ref>


==Shannon–Fano Algorithm==
===The Shannon–Fano tree===


A Shannon–Fano tree is built according to a specification designed to define an effective code table. The actual algorithm is simple:
A Shannon–Fano tree is built according to a specification designed to define an effective code table. The actual algorithm is simple:
Line 28: Line 222:


===Example===
===Example===

[[Image:ShannonCodeAlg.svg|right|thumb|300px|Shannon–Fano Algorithm]]
[[Image:ShannonCodeAlg.svg|right|thumb|300px|Shannon–Fano Algorithm]]

The example shows the construction of the Shannon code for a small alphabet. The five symbols which can be coded have the following frequency:
We continue with the previous example.
:{| class="wikitable"

:{| class="wikitable" style="text-align: center;"
! Symbol
! Symbol
! A
! A
Line 38: Line 235:
! E
! E
|-
|-
| Count
! Count
| 15
| 15
| 7
| 7
Line 45: Line 242:
| 5
| 5
|-
|-
| Probabilities
! Probabilities
| 0.38461538
| 0.385
| 0.17948718
| 0.179
| 0.15384615
| 0.154
| 0.15384615
| 0.154
| 0.12820513
| 0.128
|}
|}


Line 59: Line 256:
After four division procedures, a tree of codes results. In the final tree, the three symbols with the highest frequencies have all been assigned 2-bit codes, and two symbols with lower counts have 3-bit codes as shown table below:
After four division procedures, a tree of codes results. In the final tree, the three symbols with the highest frequencies have all been assigned 2-bit codes, and two symbols with lower counts have 3-bit codes as shown table below:


:{| class="wikitable"
:{| class="wikitable" style="text-align: center;"
! Symbol
! Symbol
! A
! A
Line 67: Line 264:
! E
! E
|-
|-
! Probabilities
| Code
| 0.385
| 0.179
| 0.154
| 0.154
| 0.128
|-
! First division
| colspan="2" | 0
| colspan="3" | 1
|-
! Second division
| rowspan="2" style="vertical-align:top"; | 0
| rowspan="2" style="vertical-align:top"; | 1
| rowspan="2" style="vertical-align:top"; | 0
| colspan="2" | 1
|-
! Third division
| 0
| 1
|-
! Codewords
| 00
| 00
| 01
| 01
Line 75: Line 293:
|}
|}


Results in 2 bits for A, B and C and per 3 bits for D and E an average bit number of
This results in lengths of 2 bits for A, B and C and per 3 bits for D and E, giving an average length of


:<math>\frac{2\,\text{Bit}\cdot(15+7+6) + 3\,\text{Bit} \cdot (6+5)}{39\, \text{Symbol}} \approx 2.28\,\text{Bits per Symbol.}</math>
:<math display="block">\frac{2\,\text{bits}\cdot(15+7+6) + 3\,\text{bits} \cdot (6+5)}{39\, \text{symbols}} \approx 2.28\,\text{bits per symbol.}</math>

We see that Fano's method, with an average length of 2.28, has outperformed Shannon's method, with an average length of 2.62.

=== Expected word length ===

It is shown by Krajči et al<ref name="Kraj" /> that the expected length of Fano's method has expected length bounded above by <math>\mathbb{E}L \leq H(X) + 1 - p_\text{min}</math>, where <math>p_\text{min} = \textstyle\min_i p_i</math> is the probability of the least common symbol.

==Comparison with other coding methods==

Neither Shannon–Fano algorithm is guaranteed to generate an optimal code. For this reason, Shannon–Fano codes are almost never used; [[Huffman coding]] is almost as computationally simple and produces prefix codes that always achieve the lowest possible expected code word length, under the constraints that each symbol is represented by a code formed of an integral number of bits. This is a constraint that is often unneeded, since the codes will be packed end-to-end in long sequences. If we consider groups of codes at a time, symbol-by-symbol Huffman coding is only optimal if the probabilities of the symbols are [[Statistical independence|independent]] and are some power of a half, i.e., <math>\textstyle 1 / 2^k</math>. In most situations, [[arithmetic coding]] can produce greater overall compression than either Huffman or Shannon–Fano, since it can encode in fractional numbers of bits which more closely approximate the actual information content of the symbol. However, arithmetic coding has not superseded Huffman the way that Huffman supersedes Shannon–Fano, both because arithmetic coding is more computationally expensive and because it is covered by multiple patents.<ref name="LiDrew2014">{{cite book|author1=Ze-Nian Li|author2=Mark S. Drew|author3=Jiangchuan Liu|title=Fundamentals of Multimedia|url=https://books.google.com/books?id=R6vBBAAAQBAJ|date=9 April 2014|publisher=Springer Science & Business Media|isbn=978-3-319-05290-8}}</ref>

===Huffman coding===


==Huffman Algorithm==
{{Main|Huffman coding}}
{{Main|Huffman coding}}

The Shannon–Fano algorithm doesn't always generate an optimal code. In 1952, [[David A. Huffman]] gave a different algorithm that always produces an optimal tree for any given probabilities. While the Shannon–Fano tree is created from the root to the leaves, the Huffman algorithm works from leaves to the root in the opposite direction.
A few years later, [[David A. Huffman]] (1952)<ref>{{Cite journal | last1 = Huffman | first1 = D. |author-link1=David A. Huffman| title = A Method for the Construction of Minimum-Redundancy Codes | doi = 10.1109/JRPROC.1952.273898 | journal = [[Proceedings of the IRE]]| volume = 40 | issue = 9 | pages = 1098–1101 | year = 1952 | url = http://compression.ru/download/articles/huff/huffman_1952_minimum-redundancy-codes.pdf}}</ref> gave a different algorithm that always produces an optimal tree for any given symbol probabilities. While Fano's Shannon–Fano tree is created by dividing from the root to the leaves, the Huffman algorithm works in the opposite direction, merging from the leaves to the root.
# Create a leaf node for each symbol and add it to frequency of occurrence.
# Create a leaf node for each symbol and add it to a [[priority queue]], using its frequency of occurrence as the priority.
# While there is more than one node in the queue:
# While there is more than one node in the queue:
## Remove the two nodes of lowest probability or frequency from the queue
## Remove the two nodes of lowest probability or frequency from the queue
Line 90: Line 320:
# The remaining node is the root node and the tree is complete.
# The remaining node is the root node and the tree is complete.


===Example===
===Example with Huffman coding===
[[File:HuffmanCodeAlg.png|right|thumb|300px|Huffman Algorithm]]
[[File:HuffmanCodeAlg.png|right|thumb|300px|Huffman Algorithm]]


Using the same frequencies as for the Shannon–Fano example above, viz:
We use the same frequencies as for the Shannon–Fano example above, viz:


:{| class="wikitable"
:{| class="wikitable" style="text-align: center;"
! Symbol
! Symbol
! A
! A
! B
! B
! C
! C
! D
! D
! E
! E
|-
|-
| Count
! Count
| 15
| 15
| 7
| 7
| 6
| 6
| 6
| 6
| 5
| 5
|-
|-
| Probabilities
! Probabilities
| 0.38461538
| 0.385
| 0.17948718
| 0.179
| 0.15384615
| 0.154
| 0.15384615
| 0.154
| 0.12820513
| 0.128
|}
|}


In this case D & E have the lowest frequencies and so are allocated 0 and 1 respectively and grouped together with a combined probability of 0.28205128. The lowest pair now are B and C so they're allocated 0 and 1 and grouped together with a combined probability of 0.33333333. This leaves BC and DE now with the lowest probabilities so 0 and 1 are prepended to their codes and they are combined. This then leaves just A and BCDE, which have 0 and 1 prepended respectively and are then combined. This leaves us with a single node and our algorithm is complete.
In this case D & E have the lowest frequencies and so are allocated 0 and 1 respectively and grouped together with a combined probability of 0.282. The lowest pair now are B and C so they're allocated 0 and 1 and grouped together with a combined probability of 0.333. This leaves BC and DE now with the lowest probabilities so 0 and 1 are prepended to their codes and they are combined. This then leaves just A and BCDE, which have 0 and 1 prepended respectively and are then combined. This leaves us with a single node and our algorithm is complete.


The code lengths for the different characters this time are 1 bit for A and 3 bits for all other characters.
The code lengths for the different characters this time are 1 bit for A and 3 bits for all other characters.


:{| class="wikitable"
:{| class="wikitable"
! Symbol
! Symbol
! A
! A
! B
! B
Line 130: Line 360:
! E
! E
|-
|-
! Codewords
| Code
| 0
| 0
| 100
| 100
Line 138: Line 368:
|}
|}


Results in 1 bit for A and per 3 bits for B, C, D and E an average bit number of
This results in the lengths of 1 bit for A and per 3 bits for B, C, D and E, giving an average length of

:<math display="block">\frac{1\,\text{bit}\cdot 15 + 3\,\text{bits} \cdot (7+6+6+5)}{39\, \text{symbols}} \approx 2.23\,\text{bits per symbol.}</math>


We see that the Huffman code has outperformed both types of Shannon–Fano code, which had expected lengths of 2.62 and 2.28.
:<math>\frac{1\,\text{Bit}\cdot 15 + 3\,\text{Bit} \cdot (7+6+6+5)}{39\, \text{Symbol}} \approx 2.23\,\text{Bits per Symbol.}</math>


== Notes==
== Notes==
Line 146: Line 378:


==References==
==References==
* {{cite journal | first = C.E. | last = Shannon | url = http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf | title = A Mathematical Theory of Communication | journal = [[Bell System Technical Journal]] | volume = 27 | pages = 379–423 | month = July | year = 1948}}
* {{cite journal | first = R.M. | last = Fano | title = The transmission of information | work = Technical Report No. 65 | year = 1949 | publisher = [[Research Laboratory of Electronics at MIT]] | location = Cambridge (Mass.), USA | ref =harv}}


* {{cite journal | first = R.M. | last = Fano | title = The transmission of information | journal = Technical Report No. 65 | year = 1949 | publisher = Research Laboratory of Electronics at MIT | location = Cambridge (Mass.), USA | url = https://archive.org/details/fano-tr65.7z}}
==External links==
* {{cite journal |first=C.E. |last=Shannon |url=https://archive.org/details/ost-engineering-shannon1948 |title=A Mathematical Theory of Communication [reprint with corrections] |journal=[[Bell System Technical Journal]] |volume=27 |pages=379–423 |date=July 1948|doi=10.1002/j.1538-7305.1948.tb01338.x }}
*[http://www.binaryessence.com/dct/en000041.htm Shannon–Fano at Binary Essence]
{{Compression Methods}}
{{Compression methods}}


{{DEFAULTSORT:Shannon-Fano coding}}
{{DEFAULTSORT:Shannon-Fano coding}}
[[Category:Lossless compression algorithms]]
[[Category:Claude Shannon]]
[[Category:Entropy coding]]
[[Category:Data compression]]

Latest revision as of 00:54, 6 December 2024

In the field of data compression, Shannon–Fano coding, named after Claude Shannon and Robert Fano, is one of two related techniques for constructing a prefix code based on a set of symbols and their probabilities (estimated or measured).

  • Shannon's method chooses a prefix code where a source symbol is given the codeword length . One common way of choosing the codewords uses the binary expansion of the cumulative probabilities. This method was proposed in Shannon's "A Mathematical Theory of Communication" (1948), his article introducing the field of information theory.
  • Fano's method divides the source symbols into two sets ("0" and "1") with probabilities as close to 1/2 as possible. Then those sets are themselves divided in two, and so on, until each set contains only one symbol. The codeword for that symbol is the string of "0"s and "1"s that records which half of the divides it fell on. This method was proposed in a later (in print) technical report by Fano (1949).

Shannon–Fano codes are suboptimal in the sense that they do not always achieve the lowest possible expected codeword length, as Huffman coding does.[1] However, Shannon–Fano codes have an expected codeword length within 1 bit of optimal. Fano's method usually produces encoding with shorter expected lengths than Shannon's method. However, Shannon's method is easier to analyse theoretically.

Shannon–Fano coding should not be confused with Shannon–Fano–Elias coding (also known as Elias coding), the precursor to arithmetic coding.

Naming

[edit]

Regarding the confusion in the two different codes being referred to by the same name, Krajči et al. write:[2]

Around 1948, both Claude E. Shannon (1948) and Robert M. Fano (1949) independently proposed two different source coding algorithms for an efficient description of a discrete memoryless source. Unfortunately, in spite of being different, both schemes became known under the same name Shannon–Fano coding.

There are several reasons for this mixup. For one thing, in the discussion of his coding scheme, Shannon mentions Fano’s scheme and calls it “substantially the same” (Shannon, 1948, p. 17 [reprint]).[3] For another, both Shannon’s and Fano’s coding schemes are similar in the sense that they both are efficient, but suboptimal prefix-free coding schemes with a similar performance.

Shannon's (1948) method, using predefined word lengths, is called Shannon–Fano coding by Cover and Thomas,[4] Goldie and Pinch,[5] Jones and Jones,[6] and Han and Kobayashi.[7] It is called Shannon coding by Yeung.[8]

Fano's (1949) method, using binary division of probabilities, is called Shannon–Fano coding by Salomon[9] and Gupta.[10] It is called Fano coding by Krajči et al.[2]

Shannon's code: predefined word lengths

[edit]

Shannon's algorithm

[edit]

Shannon's method starts by deciding on the lengths of all the codewords, then picks a prefix code with those word lengths.

Given a source with probabilities the desired codeword lengths are . Here, is the ceiling function, meaning the smallest integer greater than or equal to .

Once the codeword lengths have been determined, we must choose the codewords themselves. One method is to pick codewords in order from most probable to least probable symbols, picking each codeword to be the lexicographically first word of the correct length that maintains the prefix-free property.

A second method makes use of cumulative probabilities. First, the probabilities are written in decreasing order . Then, the cumulative probabilities are defined as

so and so on. The codeword for symbol is chosen to be the first binary digits in the binary expansion of .

Example

[edit]

This example shows the construction of a Shannon–Fano code for a small alphabet. There 5 different source symbols. Suppose 39 total symbols have been observed with the following frequencies, from which we can estimate the symbol probabilities.

Symbol A B C D E
Count 15 7 6 6 5
Probabilities 0.385 0.179 0.154 0.154 0.128

This source has entropy bits.

For the Shannon–Fano code, we need to calculate the desired word lengths .

Symbol A B C D E
Probabilities 0.385 0.179 0.154 0.154 0.128
1.379 2.480 2.700 2.700 2.963
Word lengths 2 3 3 3 3

We can pick codewords in order, choosing the lexicographically first word of the correct length that maintains the prefix-free property. Clearly A gets the codeword 00. To maintain the prefix-free property, B's codeword may not start 00, so the lexicographically first available word of length 3 is 010. Continuing like this, we get the following code:

Symbol A B C D E
Probabilities 0.385 0.179 0.154 0.154 0.128
Word lengths 2 3 3 3 3
Codewords 00 010 011 100 101

Alternatively, we can use the cumulative probability method.

Symbol A B C D E
Probabilities 0.385 0.179 0.154 0.154 0.128
Cumulative probabilities 0.000 0.385 0.564 0.718 0.872
...in binary 0.00000 0.01100 0.10010 0.10110 0.11011
Word lengths 2 3 3 3 3
Codewords 00 011 100 101 110

Note that although the codewords under the two methods are different, the word lengths are the same. We have lengths of 2 bits for A, and 3 bits for B, C, D and E, giving an average length of

which is within one bit of the entropy.

Expected word length

[edit]

For Shannon's method, the word lengths satisfy

Hence the expected word length satisfies

Here, is the entropy, and Shannon's source coding theorem says that any code must have an average length of at least . Hence we see that the Shannon–Fano code is always within one bit of the optimal expected word length.

Fano's code: binary splitting

[edit]

Outline of Fano's code

[edit]

In Fano's method, the symbols are arranged in order from most probable to least probable, and then divided into two sets whose total probabilities are as close as possible to being equal. All symbols then have the first digits of their codes assigned; symbols in the first set receive "0" and symbols in the second set receive "1". As long as any sets with more than one member remain, the same process is repeated on those sets, to determine successive digits of their codes. When a set has been reduced to one symbol this means the symbol's code is complete and will not form the prefix of any other symbol's code.

The algorithm produces fairly efficient variable-length encodings; when the two smaller sets produced by a partitioning are in fact of equal probability, the one bit of information used to distinguish them is used most efficiently. Unfortunately, Shannon–Fano coding does not always produce optimal prefix codes; the set of probabilities {0.35, 0.17, 0.17, 0.16, 0.15} is an example of one that will be assigned non-optimal codes by Shannon–Fano coding.

Fano's version of Shannon–Fano coding is used in the IMPLODE compression method, which is part of the ZIP file format.[11]

The Shannon–Fano tree

[edit]

A Shannon–Fano tree is built according to a specification designed to define an effective code table. The actual algorithm is simple:

  1. For a given list of symbols, develop a corresponding list of probabilities or frequency counts so that each symbol’s relative frequency of occurrence is known.
  2. Sort the lists of symbols according to frequency, with the most frequently occurring symbols at the left and the least common at the right.
  3. Divide the list into two parts, with the total frequency counts of the left part being as close to the total of the right as possible.
  4. The left part of the list is assigned the binary digit 0, and the right part is assigned the digit 1. This means that the codes for the symbols in the first part will all start with 0, and the codes in the second part will all start with 1.
  5. Recursively apply the steps 3 and 4 to each of the two halves, subdividing groups and adding bits to the codes until each symbol has become a corresponding code leaf on the tree.

Example

[edit]
Shannon–Fano Algorithm

We continue with the previous example.

Symbol A B C D E
Count 15 7 6 6 5
Probabilities 0.385 0.179 0.154 0.154 0.128

All symbols are sorted by frequency, from left to right (shown in Figure a). Putting the dividing line between symbols B and C results in a total of 22 in the left group and a total of 17 in the right group. This minimizes the difference in totals between the two groups.

With this division, A and B will each have a code that starts with a 0 bit, and the C, D, and E codes will all start with a 1, as shown in Figure b. Subsequently, the left half of the tree gets a new division between A and B, which puts A on a leaf with code 00 and B on a leaf with code 01.

After four division procedures, a tree of codes results. In the final tree, the three symbols with the highest frequencies have all been assigned 2-bit codes, and two symbols with lower counts have 3-bit codes as shown table below:

Symbol A B C D E
Probabilities 0.385 0.179 0.154 0.154 0.128
First division 0 1
Second division 0 1 0 1
Third division 0 1
Codewords 00 01 10 110 111

This results in lengths of 2 bits for A, B and C and per 3 bits for D and E, giving an average length of

We see that Fano's method, with an average length of 2.28, has outperformed Shannon's method, with an average length of 2.62.

Expected word length

[edit]

It is shown by Krajči et al[2] that the expected length of Fano's method has expected length bounded above by , where is the probability of the least common symbol.

Comparison with other coding methods

[edit]

Neither Shannon–Fano algorithm is guaranteed to generate an optimal code. For this reason, Shannon–Fano codes are almost never used; Huffman coding is almost as computationally simple and produces prefix codes that always achieve the lowest possible expected code word length, under the constraints that each symbol is represented by a code formed of an integral number of bits. This is a constraint that is often unneeded, since the codes will be packed end-to-end in long sequences. If we consider groups of codes at a time, symbol-by-symbol Huffman coding is only optimal if the probabilities of the symbols are independent and are some power of a half, i.e., . In most situations, arithmetic coding can produce greater overall compression than either Huffman or Shannon–Fano, since it can encode in fractional numbers of bits which more closely approximate the actual information content of the symbol. However, arithmetic coding has not superseded Huffman the way that Huffman supersedes Shannon–Fano, both because arithmetic coding is more computationally expensive and because it is covered by multiple patents.[12]

Huffman coding

[edit]

A few years later, David A. Huffman (1952)[13] gave a different algorithm that always produces an optimal tree for any given symbol probabilities. While Fano's Shannon–Fano tree is created by dividing from the root to the leaves, the Huffman algorithm works in the opposite direction, merging from the leaves to the root.

  1. Create a leaf node for each symbol and add it to a priority queue, using its frequency of occurrence as the priority.
  2. While there is more than one node in the queue:
    1. Remove the two nodes of lowest probability or frequency from the queue
    2. Prepend 0 and 1 respectively to any code already assigned to these nodes
    3. Create a new internal node with these two nodes as children and with probability equal to the sum of the two nodes' probabilities.
    4. Add the new node to the queue.
  3. The remaining node is the root node and the tree is complete.

Example with Huffman coding

[edit]
Huffman Algorithm

We use the same frequencies as for the Shannon–Fano example above, viz:

Symbol A B C D E
Count 15 7 6 6 5
Probabilities 0.385 0.179 0.154 0.154 0.128

In this case D & E have the lowest frequencies and so are allocated 0 and 1 respectively and grouped together with a combined probability of 0.282. The lowest pair now are B and C so they're allocated 0 and 1 and grouped together with a combined probability of 0.333. This leaves BC and DE now with the lowest probabilities so 0 and 1 are prepended to their codes and they are combined. This then leaves just A and BCDE, which have 0 and 1 prepended respectively and are then combined. This leaves us with a single node and our algorithm is complete.

The code lengths for the different characters this time are 1 bit for A and 3 bits for all other characters.

Symbol A B C D E
Codewords 0 100 101 110 111

This results in the lengths of 1 bit for A and per 3 bits for B, C, D and E, giving an average length of

We see that the Huffman code has outperformed both types of Shannon–Fano code, which had expected lengths of 2.62 and 2.28.

Notes

[edit]
  1. ^ Kaur, Sandeep; Singh, Sukhjeet (May 2016). "Entropy Coding and Different Coding Techniques" (PDF). Journal of Network Communications and Emerging Technologies. 6 (5): 5. S2CID 212439287. Archived from the original (PDF) on 2019-12-03. Retrieved 3 December 2019.
  2. ^ a b c Stanislav Krajči, Chin-Fu Liu, Ladislav Mikeš and Stefan M. Moser (2015), "Performance analysis of Fano coding", 2015 IEEE International Symposium on Information Theory (ISIT).
  3. ^ The Bell System Technical Journal 1948-07: Vol 27 Iss 3. AT & T Bell Laboratories. 1948-07-01. p. 403.
  4. ^ Thomas M. Cover and Joy A. Thomas (2006), Elements of Information Theory (2nd ed.), Wiley–Interscience. "Historical Notes" to Chapter 5.
  5. ^ Charles M. Goldie and Richard G. E. Pinch (1991), Communication Theory, Cambridge University Press. Section 1.6.
  6. ^ Gareth A. Jones and J. Mary Jones (2012), Information and Coding Theory (Springer). Section 3.4.
  7. ^ Te Sun Han and Kingo Kobayashi (2007), Mathematics of Information and Coding, American Mathematical Society. Subsection 3.7.1.
  8. ^ Raymond W Yeung (2002), A First Course in Information Theory, Springer. Subsection 3.2.2.
  9. ^ David Salomon (2013), Data Compression: The Complete Reference, Springer. Section 2.6.
  10. ^ Prakash C. Gupta (2006), Data Communications and Computer Networks, Phi Publishing. Subsection 1.11.5.
  11. ^ "APPNOTE.TXT - .ZIP File Format Specification". PKWARE Inc. 2007-09-28. Retrieved 2008-01-06. The Imploding algorithm is actually a combination of two distinct algorithms. The first algorithm compresses repeated byte sequences using a sliding dictionary. The second algorithm is used to compress the encoding of the sliding dictionary output, using multiple Shannon–Fano trees.
  12. ^ Ze-Nian Li; Mark S. Drew; Jiangchuan Liu (9 April 2014). Fundamentals of Multimedia. Springer Science & Business Media. ISBN 978-3-319-05290-8.
  13. ^ Huffman, D. (1952). "A Method for the Construction of Minimum-Redundancy Codes" (PDF). Proceedings of the IRE. 40 (9): 1098–1101. doi:10.1109/JRPROC.1952.273898.

References

[edit]