Huffman coding is a lossless data compression algorithm. In this algorithm, a variable-length code is assigned to input different characters. The code length is related to how frequently characters are used. Most frequent characters have the smallest codes and longer codes for least frequent characters
We create a frequency hashmap for all the characters in the string and then use data structures such as Min-Heap and Tree to convert these into binary codes. These binary codes are to be stored as bytes hence they are padded with zeros (to avoid default padding snce 1 byte=8 bits). This file is now compressed and has lesser size.
- Tree
- HashMap
- Min Heap
- Make a frequency table for all the characters in the string.
- Pick the two smallest frequencies and make their nodes in the tree.
- Make another nodes with their combined frequencies and add it to the tree as well as to the frequency table.
- Connect the two nodes to this node with combined frequencies.
- Again pick two smallest frequencies and continue the process till all the frequencies in the table are used.
- Add weights to the branches of the Tree. Add weight 0 to all the left branches and 1 to all the right branches.
- We can then get a code for each character just by following the path on the resultant tree.
- Make a map using this.
- We then pad these codes with zeros to avoid default padding as we have to store these as bytes and 1 byte = 8bits.
- We now convert the binary codes to bytes using
int(byte, 2)
and store it in the bytes array.
- We remove the padding and decode the binary codes.
- We then use the
reverse_mapping
hashmap to convert the codes back to text.
- When we run the
useHuffman.py
, two files are created.
sample.bin
-> binary file with less sizesample_decompressed.txt
-> decompressed text file
-
Drop a ⭐ on the Github Repository.
-
Make sure to install python on your system- https://www.python.org/downloads/
-
Download Python IDE or code editor for python code
- Install Anaconda for Windows
- Install Anaconda for MacOS
- Install Anaconda for Linux
- Install VS code for Windows/Mac/Linux
-
Clone the Repo by going to your local Git Client and pushing this command:
git clone https://github.com/Pranav016/Huffman-Coding.git
-
Clone the Repo to a particular folder on your system by going to your local Git Client and pushing this command:
git clone https://github.com/Pranav016/Huffman-Coding.git <folder-name>
-
Open the project in the Jupyter Notebook/VS code to use it.
Entry file- useHuffman.py