Adaptive Huffman coding: Difference between revisions
m Fix ref error |
IznoRepeat (talk | contribs) m add WP:TEMPLATECAT to remove from template; genfixes |
||
(43 intermediate revisions by 32 users not shown) | |||
Line 1: | Line 1: | ||
{{short description|Data compression technique}} |
|||
'''Adaptive Huffman coding''' (also called '''Dynamic Huffman coding''') is an [[adaptive coding]] technique based on [[Huffman coding]]. It permits building the code as the symbols are being transmitted, having no initial knowledge of source distribution, that allows one-pass encoding and adaptation to changing conditions in data. |
|||
'''Adaptive Huffman coding''' (also called '''Dynamic Huffman coding''') is an [[adaptive coding]] technique based on [[Huffman coding]]. It permits building the code as the symbols are being transmitted, having no initial knowledge of source distribution, that allows one-pass encoding and adaptation to changing conditions in data.<ref name="LiDrew2014">{{cite book|author1=Ze-Nian Li|author2=Mark S. Drew|author3=Jiangchuan Liu|title=Fundamentals of Multimedia|url=https://books.google.com/books?id=R6vBBAAAQBAJ|date=9 April 2014|publisher=Springer Science & Business Media|isbn=978-3-319-05290-8}}</ref> |
|||
The benefit of one-pass procedure is that the source can be encoded in real time, though it becomes more sensitive to transmission errors, since just a single loss ruins the whole code. |
The benefit of one-pass procedure is that the source can be encoded in real time, though it becomes more sensitive to transmission errors, since just a single loss ruins the whole code, requiring [[error detection and correction]]. |
||
==Algorithms== |
==Algorithms== |
||
Line 7: | Line 8: | ||
===FGK Algorithm === |
===FGK Algorithm === |
||
It is an online coding technique based on Huffman coding. Having no initial knowledge of occurrence frequencies, it permits dynamically adjusting the Huffman's tree as data are being transmitted. In a FGK Huffman tree, a special external node, called ''0-node'', is used to identify a newly |
It is an online coding technique based on Huffman coding. Having no initial knowledge of occurrence frequencies, it permits dynamically adjusting the Huffman's tree as data are being transmitted. In a FGK Huffman tree, a special external node, called ''0-node'', is used to identify a newly coming character. That is, whenever new data is encountered, output the path to the 0-node followed by the data. For a past-coming character, just output the path of the data in the current Huffman's tree. Most importantly, we have to adjust the FGK Huffman tree if necessary, and finally update the frequency of related nodes. As the frequency of a datum is increased, the'' sibling property'' of the Huffman's tree may be broken. The adjustment is triggered for this reason. It is accomplished by consecutive swappings of nodes, subtrees, or both. The data node is swapped with the highest-ordered node of the same frequency in the Huffman's tree, (or the subtree rooted at the highest-ordered node). All ancestor nodes of the node should also be processed in the same manner. |
||
Since the FGK Algorithm has some drawbacks about the node-or-subtree swapping, Vitter proposed another algorithm to improve it. |
Since the FGK Algorithm has some drawbacks about the node-or-subtree swapping, Vitter proposed another algorithm to improve it. |
||
Line 13: | Line 14: | ||
===Vitter algorithm=== |
===Vitter algorithm=== |
||
Some important terminologies & constraints :- |
Some important terminologies & constraints :- |
||
* '''Implicit Numbering''' : |
* '''Implicit Numbering''' : It simply means that nodes are numbered in increasing order by level and from left to right. i.e. nodes at bottom level will have low implicit number as compared to upper level nodes and nodes on same level are numbered in increasing order from left to right. In other terms, when we have built the Huffman tree, when merging two nodes into a parent node, we have set the one with the lower value as the left child, and the one with the higher value as the right child. |
||
* '''Invariant''' : For each weight w, all leaves of weight w precede all internal nodes having weight w. In other terms, when we have built the Huffman tree, if several nodes had the same value, we prioritized merging the leaves over the internal nodes. |
|||
* '''Invariant''' : For each weight w, all leaves of weight w precedes all internal nodes having weight w. |
|||
* '''Blocks''' : Nodes of same weight and same type (i.e. either leaf node or internal node) form a Block. |
* '''Blocks''' : Nodes of same weight and same type (i.e. either leaf node or internal node) form a Block. |
||
* '''Leader''' : Highest numbered node in a block. |
* '''Leader''' : Highest numbered node in a block. |
||
Line 23: | Line 23: | ||
A leaf block always precedes internal block of same weight, thus maintaining the invariant. |
A leaf block always precedes internal block of same weight, thus maintaining the invariant. |
||
'''NYT(Not Yet Transferred)''' is special node |
'''NYT (Not Yet Transferred)''' is a special node used to represent symbols which are ''<nowiki>'not yet transferred'</nowiki>''. |
||
=== '''Algorithm :-''' === |
|||
[[File:Leaf step one.png|thumb|'''Slide Method''' (leaf node): Slide_And_Increment(P) starts. |
|||
1. P is a leaf node. |
|||
]] |
|||
''leaf_to_increment'' = NULL |
|||
''P'' = pointer to symbol node |
|||
'''''Step 1 :-''''' |
|||
If (P is NYT) THEN |
|||
''BEGIN'' |
|||
[[File:Leaf step two.png|thumb|'''Slide Method''' (leaf node): Slide_And_Increment(P) |
|||
2. As P is leaf node, it slides in front of next block nodes of equal weight. |
|||
]] |
|||
extend P by adding two children. |
|||
Left child becomes new NYT and right child is the new symbol node. |
|||
''P'' = parent of new symbol node |
|||
''leaf_to_increment'' = Right Child of P. |
|||
''END'' |
|||
Go to Step 3. |
|||
[[File:Leaf step three.png|thumb|'''Slide Method''' (leaf node) : Slide_And_Increment(P) |
|||
3. Here we increase the current weight by 1. |
|||
]] |
|||
'''''Step 2 :-''''' |
|||
Swap P with leader of its block. |
|||
If (new P is sibling to NYT) THEN |
|||
''BEGIN'' |
|||
''leaf_to_increment'' = P |
|||
''P'' = parent of p. |
|||
[[File:Leaf step four.png|thumb|'''Slide Method''' (leaf node) : Slide_And_Increment(P) |
|||
4. Method comes to an end. |
|||
P is the new parent. |
|||
]] |
|||
''END'' |
|||
'''''Step 3 :-''''' |
|||
[[File:Internal one.png|thumb|'''Slide Method''' (internal node) :- Slide_And_Increment(P) |
|||
1. P is a internal node. |
|||
]] |
|||
while (P is not Root) |
|||
[[File:Internal two.png|thumb|'''Slide Method''' (internal node): Slide_And_Increment(P). |
|||
2. Node P slides in front of next block of nodes, with weight wt+1. |
|||
]] |
|||
''Slide_And_Increment( P )'' |
|||
'''Step 4 :-''' |
|||
If ( leaf_to_increment != NULL ) |
|||
''Slide_And_Increment( P )'' |
|||
'''PROCEDURE ''Slide_And_Increment( P node ) :''''' |
|||
''BEGIN'' |
|||
''previous_p'' = parent of p |
|||
''wt'' = weight of p |
|||
if ( p is Internal node ) |
|||
[[File:Internal three.png|thumb|'''Slide Method''' (internal node) : Slide_And_Increment(P) |
|||
3. Now we increase the weight by 9. Thus the '''''invariant is maintained''''' as the current node is an internal node and should occur in front of leaf nodes of equal weight as we have increased the weight. |
|||
]] |
|||
Slide P in the tree higher than the leaf nodes of weight ''wt + 1.'' |
|||
else |
|||
Slide P in the tree higher than the nodes of weight ''wt''. |
|||
''wt'' += 1 |
|||
if ( p is internal node ) |
|||
p = ''previous_p'' |
|||
else |
|||
[[File:Leaf step one.png|thumb|Slide_And_Increment(leaf node) sliding starts. P is a leaf node.]] |
|||
p = new parent of p. |
|||
[[File:Leaf step two.png|thumb|Slide_And_Increment(leaf node) sliding step 2. As P is leaf node, it slides in front of next block nodes of equal weight.]] |
|||
[[File:Leaf step three.png|thumb|Slide_And_Increment(leaf node) sliding step 3. Here we increase the current weight by 1.]] |
|||
[[File:Leaf step four.png|thumb|Slide_And_Increment(leaf node) sliding step 4. Method comes to an end. P is the new parent.]] |
|||
[[File:Internal one.png|thumb|Slide_And_Increment(internal node) sliding starts. P is an internal node.]] |
|||
[[File:Internal two.png|thumb|Slide_And_Increment(internal node) sliding step 2. Node P slides in front of next block of leaves nodes, with weight wt+1.]] |
|||
[[File:Internal three.png|thumb|Slide_And_Increment(internal node) sliding step 3. Now we increase the weight to 9. Thus the '''''invariant is maintained''''' as the current node is an internal node and should occur in front of leaf nodes of equal weight as we have increased the weight.]] |
|||
[[File:Internal four.png|thumb|Slide_And_Increment(internal node) sliding step 4. Now the 'P' points to the former parent ( as in the case of internal node according to algorithm)]] |
|||
'''algorithm''' for adding a symbol '''is''' |
|||
''END'' |
|||
leaf_to_increment := NULL |
|||
p := pointer to the leaf node containing the next symbol |
|||
'''if''' (p is NYT) '''then''' |
|||
Extend p by adding two children |
|||
Left child becomes new NYT and right child is the new symbol leaf node |
|||
''p'' := parent of new symbol leaf node |
|||
leaf_to_increment := Right Child of p |
|||
'''else''' |
|||
Swap p with leader of its block |
|||
'''if''' (new p is sibling to NYT) '''then''' |
|||
leaf_to_increment := p |
|||
''p'' := parent of p |
|||
'''while''' (p ≠ NULL) '''do''' |
|||
Slide_And_Increment(p) |
|||
'''if''' (leaf_to_increment != NULL) '''then''' |
|||
Slide_And_Increment(leaf_to_increment) |
|||
'''function''' Slide_And_Increment(p) '''is''' |
|||
previous_p := parent of ''p'' |
|||
4. Now the 'P' points to the former parent ( as in the case of internal node according to algorithm) |
|||
]] |
|||
'''if''' (p is an internal node) '''then''' |
|||
Slide p in the tree higher than the leaf nodes of weight wt + 1 |
|||
increase weight of ''p'' by 1 |
|||
''p'' := previous_p |
|||
'''else''' |
|||
Slide p in the tree higher than the internal nodes of weight wt |
|||
increase weight of ''p'' by 1 |
|||
''p'' := new parent of ''p''. |
|||
Encoder and decoder start with only the root node, which has the maximum number. In the beginning it is our initial NYT node. |
Encoder and decoder start with only the root node, which has the maximum number. In the beginning it is our initial NYT node. |
||
Line 131: | Line 75: | ||
====Example==== |
====Example==== |
||
[[File:Adaptive Huffman Vitter.jpg]] |
[[File:Adaptive Huffman Vitter.jpg|800px]] |
||
Encoding "abb" gives 01100001 001100010 11. |
Encoding "abb" gives 01100001 001100010 11. |
||
Step 1: |
Step 1: |
||
Line 149: | Line 93: | ||
Step 3: |
Step 3: |
||
NYT spawns two child nodes: 252 for NYT and 253 for leaf node, both with weight 0. Increase weights for 253, 254, and root. To maintain |
NYT spawns two child nodes: 252 for NYT and 253 for leaf node, both with weight 0. Increase weights for 253, 254, and root. To maintain Vitter's invariant that all leaves of weight w precede (in the implicit numbering) all internal nodes of weight w, the branch starting with node 254 should be swapped (in terms of symbols and weights, but not number ordering) with node 255. Code for "b" is 11. |
||
For the second "b" transmit 11. |
For the second "b" transmit 11. |
||
For the convenience of explanation this step doesn't exactly follow Vitter's algorithm,<ref name=":0">{{cite web|url=http://www.cs.duke.edu/csed/curious/compression/adaptivehuff.html#tree |title=Adaptive Huffman Coding |publisher=Cs.duke.edu |access-date=2012-02-26}}</ref> but the effects are equivalent. |
|||
Step 4: |
Step 4: |
||
Go to leaf node 253. Notice we have two blocks with weight 1. Node 253 and 254 is one block (consisting of leaves), node 255 is another block (consisting of internal nodes). For node 253, the biggest number in its block is 254, so swap the weights and symbols of nodes 253 and 254. Now node 254 and the branch starting from node 255 satisfy the SlideAndIncrement condition<ref name=":0" |
Go to leaf node 253. Notice we have two blocks with weight 1. Node 253 and 254 is one block (consisting of leaves), node 255 is another block (consisting of internal nodes). For node 253, the biggest number in its block is 254, so swap the weights and symbols of nodes 253 and 254. Now node 254 and the branch starting from node 255 satisfy the SlideAndIncrement condition<ref name=":0"/> and hence must be swapped. At last increase node 255 and 256's weight. |
||
Future code for "b" is 1, and for "a" is now 01, which reflects their frequency. |
Future code for "b" is 1, and for "a" is now 01, which reflects their frequency. |
||
Line 165: | Line 109: | ||
{{reflist}} |
{{reflist}} |
||
* Vitter's original paper: J. S. Vitter, "[http://www. |
* Vitter's original paper: J. S. Vitter, "[http://www.ittc.ku.edu/~jsv/Papers/Vit87.jacmACMversion.pdf Design and Analysis of Dynamic Huffman Codes]", Journal of the ACM, 34(4), October 1987, pp 825–845. |
||
* J. S. Vitter, "ALGORITHM 673 Dynamic Huffman Coding", ACM Transactions on Mathematical Software, 15(2), June 1989, pp 158–167. Also appears in Collected Algorithms of ACM. |
* J. S. Vitter, "ALGORITHM 673 Dynamic Huffman Coding", ACM Transactions on Mathematical Software, 15(2), June 1989, pp 158–167. Also appears in Collected Algorithms of ACM. |
||
* Donald E. Knuth, "Dynamic Huffman Coding", Journal of Algorithm, 6(2), 1985, pp 163–180. |
* Donald E. Knuth, "Dynamic Huffman Coding", Journal of Algorithm, 6(2), 1985, pp 163–180. |
||
Line 173: | Line 117: | ||
* [http://www.ics.uci.edu/~dan/pubs/DC-Sec4.html University of California Dan Hirschberg site] |
* [http://www.ics.uci.edu/~dan/pubs/DC-Sec4.html University of California Dan Hirschberg site] |
||
* [http://www.cs.cf.ac.uk/Dave/Multimedia/node212.html Cardiff University Dr. David Marshall site] |
* [http://www.cs.cf.ac.uk/Dave/Multimedia/node212.html Cardiff University Dr. David Marshall site] |
||
* [ |
* [https://code.google.com/p/compression-code/downloads/list C implementation of Vitter algorithm] |
||
* [http://www.cs.duke.edu/csed/curious/compression/adaptivehuff.html Excellent description from Duke University] |
* [http://www.cs.duke.edu/csed/curious/compression/adaptivehuff.html Excellent description from Duke University] |
||
Line 179: | Line 123: | ||
[[Category:Lossless compression algorithms]] |
[[Category:Lossless compression algorithms]] |
||
[[Category:Data compression]] |
Latest revision as of 23:50, 5 December 2024
Adaptive Huffman coding (also called Dynamic Huffman coding) is an adaptive coding technique based on Huffman coding. It permits building the code as the symbols are being transmitted, having no initial knowledge of source distribution, that allows one-pass encoding and adaptation to changing conditions in data.[1]
The benefit of one-pass procedure is that the source can be encoded in real time, though it becomes more sensitive to transmission errors, since just a single loss ruins the whole code, requiring error detection and correction.
Algorithms
[edit]There are a number of implementations of this method, the most notable are FGK (Faller-Gallager-Knuth) and Vitter algorithm.
FGK Algorithm
[edit]It is an online coding technique based on Huffman coding. Having no initial knowledge of occurrence frequencies, it permits dynamically adjusting the Huffman's tree as data are being transmitted. In a FGK Huffman tree, a special external node, called 0-node, is used to identify a newly coming character. That is, whenever new data is encountered, output the path to the 0-node followed by the data. For a past-coming character, just output the path of the data in the current Huffman's tree. Most importantly, we have to adjust the FGK Huffman tree if necessary, and finally update the frequency of related nodes. As the frequency of a datum is increased, the sibling property of the Huffman's tree may be broken. The adjustment is triggered for this reason. It is accomplished by consecutive swappings of nodes, subtrees, or both. The data node is swapped with the highest-ordered node of the same frequency in the Huffman's tree, (or the subtree rooted at the highest-ordered node). All ancestor nodes of the node should also be processed in the same manner.
Since the FGK Algorithm has some drawbacks about the node-or-subtree swapping, Vitter proposed another algorithm to improve it.
Vitter algorithm
[edit]Some important terminologies & constraints :-
- Implicit Numbering : It simply means that nodes are numbered in increasing order by level and from left to right. i.e. nodes at bottom level will have low implicit number as compared to upper level nodes and nodes on same level are numbered in increasing order from left to right. In other terms, when we have built the Huffman tree, when merging two nodes into a parent node, we have set the one with the lower value as the left child, and the one with the higher value as the right child.
- Invariant : For each weight w, all leaves of weight w precede all internal nodes having weight w. In other terms, when we have built the Huffman tree, if several nodes had the same value, we prioritized merging the leaves over the internal nodes.
- Blocks : Nodes of same weight and same type (i.e. either leaf node or internal node) form a Block.
- Leader : Highest numbered node in a block.
Blocks are interlinked by increasing order of their weights.
A leaf block always precedes internal block of same weight, thus maintaining the invariant.
NYT (Not Yet Transferred) is a special node used to represent symbols which are 'not yet transferred'.
algorithm for adding a symbol is leaf_to_increment := NULL p := pointer to the leaf node containing the next symbol if (p is NYT) then Extend p by adding two children Left child becomes new NYT and right child is the new symbol leaf node p := parent of new symbol leaf node leaf_to_increment := Right Child of p else Swap p with leader of its block if (new p is sibling to NYT) then leaf_to_increment := p p := parent of p while (p ≠ NULL) do Slide_And_Increment(p) if (leaf_to_increment != NULL) then Slide_And_Increment(leaf_to_increment)
function Slide_And_Increment(p) is previous_p := parent of p if (p is an internal node) then Slide p in the tree higher than the leaf nodes of weight wt + 1 increase weight of p by 1 p := previous_p else Slide p in the tree higher than the internal nodes of weight wt increase weight of p by 1 p := new parent of p.
Encoder and decoder start with only the root node, which has the maximum number. In the beginning it is our initial NYT node.
When we transmit an NYT symbol, we have to transmit code for the NYT node, then for its generic code.
For every symbol that is already in the tree, we only have to transmit code for its leaf node.
Example
[edit]Encoding "abb" gives 01100001 001100010 11.
Step 1:
Start with an empty tree.
For "a" transmit its binary code.
Step 2:
NYT spawns two child nodes: 254 and 255, both with weight 0. Increase weight for root and 255. Code for "a", associated with node 255, is 1.
For "b" transmit 0 (for NYT node) then its binary code.
Step 3:
NYT spawns two child nodes: 252 for NYT and 253 for leaf node, both with weight 0. Increase weights for 253, 254, and root. To maintain Vitter's invariant that all leaves of weight w precede (in the implicit numbering) all internal nodes of weight w, the branch starting with node 254 should be swapped (in terms of symbols and weights, but not number ordering) with node 255. Code for "b" is 11.
For the second "b" transmit 11.
For the convenience of explanation this step doesn't exactly follow Vitter's algorithm,[2] but the effects are equivalent.
Step 4:
Go to leaf node 253. Notice we have two blocks with weight 1. Node 253 and 254 is one block (consisting of leaves), node 255 is another block (consisting of internal nodes). For node 253, the biggest number in its block is 254, so swap the weights and symbols of nodes 253 and 254. Now node 254 and the branch starting from node 255 satisfy the SlideAndIncrement condition[2] and hence must be swapped. At last increase node 255 and 256's weight.
Future code for "b" is 1, and for "a" is now 01, which reflects their frequency.
References
[edit]- ^ Ze-Nian Li; Mark S. Drew; Jiangchuan Liu (9 April 2014). Fundamentals of Multimedia. Springer Science & Business Media. ISBN 978-3-319-05290-8.
- ^ a b "Adaptive Huffman Coding". Cs.duke.edu. Retrieved 2012-02-26.
- Vitter's original paper: J. S. Vitter, "Design and Analysis of Dynamic Huffman Codes", Journal of the ACM, 34(4), October 1987, pp 825–845.
- J. S. Vitter, "ALGORITHM 673 Dynamic Huffman Coding", ACM Transactions on Mathematical Software, 15(2), June 1989, pp 158–167. Also appears in Collected Algorithms of ACM.
- Donald E. Knuth, "Dynamic Huffman Coding", Journal of Algorithm, 6(2), 1985, pp 163–180.
External links
[edit]- This article incorporates public domain material from Paul E. Black. "adaptive Huffman coding". Dictionary of Algorithms and Data Structures. NIST.
- University of California Dan Hirschberg site
- Cardiff University Dr. David Marshall site
- C implementation of Vitter algorithm
- Excellent description from Duke University