Welcome to shcodec home page

shcodec
shcodec is order-0 32-bit canonical static huffman codec. It encodes alphabet of 256 symbols with minimum-redundancy or length-restricted codes(basic method: Alistair Moffat and Jyrki Katajainen, modified by Artur A. Pessoa). shcodec uses efficient method for tree packing: on text files packed tree size is approx 68 bytes, on binary files this value is about 132 bytes. Memory requirements are very small: 1280 bytes for encoding and only 574 bytes for decoding! shcodec uses extremely fast and simple SHIFT-OR method for encoding, and CANONICAL-DECODE with a cache for small codewords for decoding.

Developer
shcodec package contain all routines, needed to implement fast and efficient static huffman codec. All necessary preprocessing steps are described in the test program. In order to keep high operating speed, encode/decode routines are implemented as macro insead of functions: they encode/decode one symbol at time to/from the specified buffer The way how encode/decode loops must be organized is also shown in the test program.

Experimental results
Next table presents experimental results for shcodec version 1.0.1 and range coder version 1.3 on the calgary corpus set. In this table we use the following notation:

file file to encode
file size original file size in bytes
tree packed tree size in bytes (tree is included into the shc size)
shc/rng size encoded file size in bytes (with shcodec/range coder)
shc/rng bpb bpb value (for shcodec/range coder)
ent bpb order-0 entropy bpb value

With (#) sign we denote files where range coder is more efficient than shcodec(only 2 files from 18).

file file size tree shc size rng size shc bpb rng bpb ent bpb
bib 111,261 68 72,837 73,421 5,237 5,279 5,201
book1 768,771 68 438,449 441,851 4,563 4,598 4,527
book2 610,856 68 368,373 370,656 4,824 4,854 4,793
geo 102,400 132 72,693 73,078 5,679 5,709 5,646
news 377,109 56 246,457 248,417 5,228 5,270 5,190
obj1 21,504 132 16,189 16,493 6,023 6,136 5,948
obj2 246,814 132 194,233 189,329 6,296 6,137# 6,260
paper1 53,161 68 33,413 33,419 5,028 5,029 4,983
paper2 82,199 68 47,689 48,022 4,641 4,674 4,601
paper3 46,526 56 27,337 27,645 4,701 4,753 4,665
paper4 13,286 52 7,917 8,047 4,767 4,845 4,700
paper5 11,954 56 7,493 7,698 5,015 5,152 4,936
paper6 38,105 68 24,097 24,225 5,059 5,086 5,010
pic 513,216 140 106,697 79,391 1,663 1,238# 1,210
progc 39,611 56 25,977 26,200 5,246 5,291 5,199
progl 71,646 56 43,045 43,215 4,806 4,825 4,770
progp 49,379 68 30,289 30,677 4,907 4,970 4,869
trans 93,695 76 65,301 65,586 5,576 5,600 5,533
average 79 4,959 4,969 4,891

As you can see, on most files(16 from 18), shcodec outperformes range coder a little(on 0,201% in average). On very redundant files, such as pic, and inhomogenious ones, adaptive entropy coders are usually better.

So, shcodec does`t suits for high order models and not so efficient, when symbol distribution is very skewed(e.g. a lot of zeros in pic), but in practice, under order-0 model, it shows compression performance comparable with best known entropy coders and runs very fast.

License
This software may be used freely for any purpose. However, when distributed, the original source must be clearly stated, and, when the source code is distributed, the copyright notice must be retained and any alterations in the code must be clearly marked. No warranty is given regarding the quality of this software.

Download
shc101.zip(~45 Kbytes) - shcodec source code, win32 executable and documentation.
shclib-1.0.1.tar.gz (~5 Kbytes) - shcodec library.
shcdll.zip (~47 Kbytes) - windows dll version of the shcodec library.
sh-sfx.zip(~28 Kbytes) - utility for creating SFX archives from shcodec files under win32.
HuffmanCode.zip(~29 Kbytes) - My article about Huffman Code (in Russian). Also in html.

Support
If you have a question about using shcodec, you can reach me via email: xander at entropyware dot info

See also: EPSILON - powerful Open Source wavelet image compressor


Copyright (C) 1998-2009 Simakov Alexander
Valid HTML 4.01!