Compression techniques such as some variations of Lempel-Ziv (LZ) reduce an input stream into a self-referencing “dictionary” where a second occurrence of a string of characters is replaced by a copy command referencing a prior occurrence of the string as a number of characters (i.e., a length) at a certain offset prior to a current position. For example, the string “bcdezbcde” could be compressed to the string “bcdez” followed by a command to copy a length of 4 characters starting at an offset 5 characters previous to the current position (i.e., “Copy(4,5)”). If the instruction “Copy(4,5)” takes fewer bits to output than the corresponding string of characters, the output is more compressed than the input.
Ideally, compression engines try to find an optimal set of copy instructions to produce a resulting output that is as short as possible. However, practical limits of time and space, particularly for hardware implementations, make achieving good compression in a short amount of time difficult.
It would be desirable to implement run pre-processing to optimize compression engine throughput.