1. Field
The present invention relates generally to data compression and more specifically to efficient identification of compressible segments.
2. Related Art
Compression techniques may be used in conjunction with a variety of important application areas. For example, compression techniques may be used to improve speed and/or functionality and/or reduce hardware and/or networking requirements in a variety of application areas such as, but not limited to: storage, back-up and network traffic reduction. However, applying a compression technique may incur costs by introducing a time delay, absorbing computing cycles and/or requiring additional hardware.
Many compression techniques require the identification of matches between an input dataset or dataset segment and an entry in a dictionary or a portion of data history using a substring search mechanism. In some substring search mechanisms, identification of matches may be done using a selective fingerprinting technique for identifying a characteristic set of fingerprints for representing an input dataset. However, according to some selective fingerprinting implementations, unaligned matches may be missed if the fingerprints are selected strictly based on spatial distribution with respect to the input dataset. Selecting a characteristic set of fingerprints based only on the value of the fingerprint may provide a set of characteristic fingerprints which are probabilistically distributed but not necessarily spatially distributed, sometimes leading to unpredictable and/or large missed matches. What is needed is a compression technique capable of efficiently locating matches between an input dataset and a dictionary or history.