Due to limited computing resource, compression programs routinely limit large quantities of data to be compressed together in smaller segments called windows. The process of doing this is called windowing. In delta compression, a target file is compressed given some related source file. For large files, the windowing process is done by first dividing the target file into target windows, then compressing each such target window against some source window. The source window is often derived from a source file but also may be derived from some part of the target file that precedes the current target window.
Typically, delta compressors select window sizes so that data structures can be built and manipulated entirely in main memory of a computer. In fact, delta compressors typically use fixed-size windows. In addition, delta compressors often use matching file offsets for processing source and target windows. Using matching file offsets works well if there are small changes between source and target files but when more extensive changes are present, the compression rate in which the file is compressed can decline greatly. Although a brute force approach to finding matching windows may be used to align a target window with every location in a source file, this technique tends to be very slow and inefficient.
In U.S. patent application Ser. No. 10/894,421 entitled “Method and Apparatus for windowing in Entropy Encoding”, Vo et al. disclose a windowing technique based on n-grams to find matching windows with similarity regardless of file offsets. Although the techniques disclosed in that application may be used on large classes of data, the techniques employ fixed size windows that are sensitive to the matched lengths of data across source and target files. For example, if the window size is established too large, the window matching algorithm may not find any matching windows since none will be similar. Alternatively, if the window size is established too small, the compression rate will not be as good as that which could be achieved.
As a result, there is a need to for a computational efficient technique for data compression irrespective of window size or file offsets.