Data compression is the process of encoding information using fewer bits than the unencoded information. One of the best performing compression schemes, in terms of archive size and decompression speed, is a dictionary-based sequential coder that uses a compression algorithm such as Lempel-Ziv 77 (LZ77), Lempel-Ziv 78 (LZ78), Lempel-Ziv-Markov chain algorithm (LZMA), or a variant of these algorithms.
A dictionary-based sequential coder (also known as a “dictionary coder” or a “substitution coder”) implements a class of lossless data compression algorithms which operate by searching for matches between the text to be compressed and a set of strings contained in a data structure (also known as the “dictionary”). The dictionary is maintained by the encoder. When the encoder finds a match, it replaces the text with a reference to the string's position in the dictionary to gain compression.
Dictionary-based sequential coders, particularly LZMA, typically have a very slow compression speed. Conventionally, sequential coders (including dictionary-based sequential coders) process an input byte-by-byte (that is, sequentially). Sequential processing inherent in the sequential coders prevents parallelization of data compression. Therefore, dictionary-based sequential coders are typically slower than some other coders (e.g., block sorting coders), which are easier to parallelize.