Today, much of computer science involves the storage and transmission of sequences of short codes. Example short codes include, but are not limited to, the file and directory names used by operating systems, chat conversations having very short statements, and website URL's defined by a single short sequence of characters within a limited character universe.
For purposes of both storage and transmission, it is advantageous to represent these and other short codes in as few bytes (indeed, bits) as possible. A typical uncompressed encoding of printable ASCII codes uses 8-bits (or 1 byte) per code. Many of today's encoding schemes may use more than 1 byte per code, as they can represent a universe of codes greater in size than two-hundred and fifty-six (256), which is the largest number of discrete codes which can be represented using binary in eight (8) bits.
Many of today's compression algorithms identify patterns in the codes they read, and then exploit the identified patterns by creating a dynamic dictionary that is used to express subsequent occurrences of the patterns in the codes more compactly. This approach, while useful for long sequences, provides limited value for the shorter sequences of codes that dominate computer science. Most lossless data compression algorithms, such as the Lempel-Ziv (‘LZ’) compression methods and their many variants, yield poor results when used with short code sequences. The results are considered poor because the encoded output contains more bits than were present in the original sequence, resulting in expansion, not compression, of the short code sequences.
Accordingly, there exists a need to have a simple, quick-to-execute, lossless method of encoding and decoding short sequences of information.