Huffman coding | Greedy Algorithm
The Huffman coding is also known for its lossless data compression algorithm and this technique is to encode (digitizing into 0 and 1) and compress the data file from large size to small size without losing any single bit information. Its motive is to assign variable-length codes to the input characters which are also called Prefix codes. A character having the highest frequency will get the smallest code and a character with the lowest frequency will have the largest code.
Prefix Codes — assigned code, i.e., the sequence of bits that are assigned to one character is unique and will be different from other characters’ bit sequences.
Steps for the Huffman coding scheme are mentioned below:
○ Creating a Huffman tree with a set of input characters and their frequencies. ○ Assigning binary codes with each traversal to every character.
- Steps for creating Huffman tree algorithm with given input characters and frequencies
- Step 1- Take two characters with the lowest frequency i.e., newline (\n) and S, and create two leaf nodes for \n and S.
- Step 2- Their sum of frequencies will be stored in a parent node. So, 4 (sum of \n frequency 1 and S char 3) will be stored in the parent node.
- Step 3- Taking the next lowest frequency, which is T with frequency 4 will get added beneath and the next parent node will again be the sum of T frequency (4) and the previous parent node (4), making 8.
- Step 4- Repeat step 3, until we get the highest parent node value compared to other frequencies.
- Step 5- After the parent node with value 18, create a new subtree following the step-1 and step-2.
- Step 7- Its frequency will again add to its next parent node.
2. Assigning the binary codes 0 and 1 to each character through traversal
- Fill the value of left traversal with value 0 and right traversal with value 1.
- Traverse down through the character from the main node.
Therefore, the time complexity will be O(nlogn).
It is widely used in compression formats like GZIP, PKZIP, BZIP and multimedia codecs like JPEG, PNG, MP3 also uses Huffman Encoding.
Advantages of Huffman Coding
- It saves a lot of storage space
- Generates shorter binary codes for encoding
- Codes generated are prefix-free
Disadvantages of Huffman Coding
- The lossless data encoding technique, therefore, is suitable for only encoding text and programming files.
- A considerably slower process.
- Differences in binary length codes make it difficult for decoding software to detect whether the data is corrupted or not.