Unishox
A hybrid encoder for Short Unicode Strings
|
In general compression utilities such as zip
, gzip
do not compress short strings well and often expand them. They also use lots of memory which makes them unusable in constrained environments like Arduino. So Unishox algorithm was developed for individually compressing (and decompressing) short strings.
Note: The present byte-code version is 2 and it replaces Unishox 1. Unishox 1 is still available as unishox1.c, but it will have to be compiled manually if it is needed.
Unishox is an hybrid encoder (entropy, dictionary and delta coding). It works by assigning fixed prefix-free codes for each letter in the above Character Set (entropy coding). It also encodes repeating letter sets separately (dictionary coding). For Unicode characters, delta coding is used.
The model used for arriving at the prefix-free code is shown below:
The complete specification can be found in this article: A hybrid encoder for compressing Short Unicode Strings.
To compile, just use make
or use gcc as follows:
For testing the compiled program, use:
To see Unishox in action, simply try to compress a string:
To compress and decompress a file, use:
Unishox does not give good ratios compressing large files or compressing binary files.
Unishox supports the entire Unicode character set. As of now it supports UTF-8 as input and output encoding.
In case of any issues, please email the Author (Arundale Ramanathan) at arun@ or create GitHub issue. siar a.cc