A Lempel-Ziv compression technique searches for recurring data patterns in a stream of bytes. See, for example, Jacob Ziv and Abraham Lempel, “A Universal Algorithm for Sequential Data Compression,” IEEE Trans. on Information Theory, 23(3), 337-343 (May 1977), incorporated by reference herein. Performing the matching at all bytes of the stream, however, is time consuming. A conventional approach to improve the compression throughput uses chains of hash values. Flash chains help the compression technique process sequences with the same hash value to find potential matches.
Generally, a normal hash chain is created by setting pointers in each given hash value location to a nearest previous given hash value location. Long byte runs are known to create long hash chains having many pointers to be considered by the compression technique. U.S. patent application Ser. No. 13/659,036, filed Oct. 24, 2012, entitled “Method to Shorten Hash Chains in Lempel-Ziv Compression of Data with Repetitive Symbols,”incorporated by reference herein, discloses a hash chain construction technique that shortens hash chains generated in a presence of data value runs in a stream of data values e.g., bytes or symbols), referred to as byte runs. A byte run is generally a sequence of two or more locations (or nodes) that have the same byte. The shortened hash chains generally allow a Lempel-Ziv (LZ) compression search to process through the data value runs of any length quickly by visiting only a few nodes. The shortened hash chains may also enable the LZ compression search to compare the runs by lengths, instead of through byte-by-byte comparisons. No extra storage costs may be incurred by the shortened hash chains when compared with classical hash chains.
While the disclosed byte run techniques effectively shorten hash chains, hash chains can also be shortened by addressing repetitive patterns or multi-byte runs, such as “abababab . . . ” or “abcabc . . . ,” which lead to long chains that cost significant search time. A need therefore exists for techniques for shortening hash chains in Lempel-Ziv compression of data with repetitive patterns.