7.18.2. Building Huffman Coding Trees¶
Huffman coding assigns codes to characters such that the length of the code depends on the relative frequency or weight of the corresponding character. Thus, it is a variable-length code. If the estimated frequencies for letters match the actual frequency found in an encoded message, then the length of that message will typically be less than if a fixed-length code had been used. The Huffman code for each letter is derived from a full binary tree called the Huffman coding tree, or simply the Huffman tree. Each leaf of the Huffman tree corresponds to a letter, and we define the weight of the leaf node to be the weight (frequency) of its associated letter. The goal is to build a tree with the minimum external path weight. Define the weighted path length of a leaf to be its weight times its depth. The binary tree with minimum external path weight is the one with the minimum sum of weighted path lengths for the given set of leaves. A letter with high weight should have low depth, so that it will count the least against the total path length. As a result, another letter might be pushed deeper in the tree if it has less weight.
The process of building the Huffman tree for \(n\) letters is quite simple. First, create a collection of \(n\) initial Huffman trees, each of which is a single leaf node containing one of the letters. Put the \(n\) partial trees onto a priority queue organized by weight (frequency). Next, remove the first two trees (the ones with lowest weight) from the priority queue. Join these two trees together to create a new tree whose root has the two trees as children, and whose weight is the sum of the weights of the two trees. Put this new tree back into the priority queue. This process is repeated until all of the partial Huffman trees have been combined into one.
Table 7.18.2
The relative frequencies for eight selected letters.
The following slideshow illustrates the Huffman tree construction process for the eight letters of Table 7.18.2. [2]
Here is the implementation for Huffman tree nodes.
/** Huffman tree node implementation: Base class */
interface HuffBaseNode {
boolean isLeaf();
int weight();
}
/** Huffman tree node: Leaf class */
class HuffLeafNode implements HuffBaseNode {
private char element; // Element for this node
private int weight; // Weight for this node
/** Constructor */
HuffLeafNode(char el, int wt)
{ element = el; weight = wt; }
/** @return The element value */
char value() { return element; }
/** @return The weight */
int weight() { return weight; }
/** Return true */
boolean isLeaf() { return true; }
}
/** Huffman tree node: Internal class */
class HuffInternalNode implements HuffBaseNode {
private int weight;
private HuffBaseNode left;
private HuffBaseNode right;
/** Constructor */
HuffInternalNode(HuffBaseNode l,
HuffBaseNode r, int wt)
{ left = l; right = r; weight = wt; }
/** @return The left child */
HuffBaseNode left() { return left; }
/** @return The right child */
HuffBaseNode right() { return right; }
/** @return The weight */
int weight() { return weight; }
/** Return false */
boolean isLeaf() { return false; }
}
This implementation is similar to
a typical class hierarchy
for implementing full binary trees.
There is an abstract base class, named HuffNode
, and two
subclasses, named LeafNode
and IntlNode
.
This implementation reflects the fact that leaf and
internal nodes contain distinctly different information.
Here is the implementation for the Huffman Tree class.
/** A Huffman coding tree */
class HuffTree implements Comparable {
private HuffBaseNode root;
/** Constructors */
HuffTree(char el, int wt)
{ root = new HuffLeafNode(el, wt); }
HuffTree(HuffBaseNode l, HuffBaseNode r, int wt)
{ root = new HuffInternalNode(l, r, wt); }
HuffBaseNode root() { return root; }
int weight() // Weight of tree is weight of root
{ return root.weight(); }
int compareTo(Object t) {
HuffTree that = (HuffTree)t;
if (root.weight() < that.weight()) return -1;
else if (root.weight() == that.weight()) return 0;
else return 1;
}
}
Here is the implementation for the tree-building process.
static HuffTree buildTree() {
HuffTree tmp1, tmp2, tmp3 = null;
while (Hheap.heapsize() > 1) { // While two items left
tmp1 = Hheap.removemin();
tmp2 = Hheap.removemin();
tmp3 = new HuffTree(tmp1.root(), tmp2.root(),
tmp1.weight() + tmp2.weight());
Hheap.insert(tmp3); // Return new tree to heap
}
return tmp3; // Return the tree
}
buildHuff
takes as input fl
, the min-heap of partial
Huffman trees, which initially are single leaf nodes as shown in Step
1 of the slideshow above.
The body of function buildTree
consists mainly of a for
loop. On each iteration of the for
loop, the first two partial
trees are taken off the heap and placed in variables temp1
and
temp2
.
A tree is created (temp3
) such that the left and right subtrees
are temp1
and temp2
, respectively.
Finally, temp3
is returned to fl
.
[2] | ASCII coding actually uses 8 bits per character. Seven bits are used to represent the 128 codes of the ASCII character set. The eigth bit as a parity bit, that can be used to check if there is a transmission error for the character. |