FIG. 1 illustrates a typical LZ data compression method. The LZ compression method of FIG. 1 processes an input data stream 10 to generate a compressed data output stream 20 by comparing an uncompressed portion 13 of input data stream 10 to data in a history buffer 11 of already processed input data. If a data string 12 is located in history buffer 11 which matches current data string 14, data string 14 is then encoded in compressed data stream 20 as a codeword (p.sub.o, l.sub.o) 24, corresponding to an offset p.sub.o 15 and a data length l.sub.o 16. The shorter length of data, such as codeword (p.sub.o, l.sub.o) 24 thus replaces longer data string 14 in output compressed data stream 20.
Offset p.sub.o 15 is typically a random number that fails within a known range of values determined by the length of history buffer 11. Although the actual value of offset 15 often happens to be a small value, offset 15 also has an upper maximum in that known range of values that increases as the length of history buffer 11 increases. Encoding a variable, such as offset 15, using a single codebook coding method is well known. Typical single fixed length codebook coding method represents offset 15 using a fixed length codeword. A n-bit fixed length codebook codes 2.sup.n source data, encoding decimal equivalents from 0 to (2.sup.n -1) with fixed n number of bits per codeword.
FIG. 2 illustrates two fixed length codebooks, a 3-bit fixed length codebook 30 and a 4-bit fixed length codebook 40. A 3-bit fixed length codebook encodes up to 2.sup.3 or 8 source data, encoding data from 0 to 7 with 3 bits per codeword. The coding range maximum for a 3-bit fixed codebook is therefore equivalent to the decimal numeral 7. A 4-bit fixed length codebook provides a larger codebook than a 3-bit fixed length codebook, since a 4-bit codebook codes 2.sup.4 or 16 source data, providing a coding range from 0-15. It would be desirable to select a small single fixed length codebook to encode offset 15, which is often a small value. However, since the upper range of offset 15 is constantly increasing to correspond to the constantly increasing length of history buffer 11 as the data compression progresses, the single fixed length codebook selected at the outset of data compression should be sufficiently large to encode the maximum possible value of offset 15. Selecting such a large single fixed length codebook is inefficient when the typical offset value is often a small value, and using an unnecessarily large codebook therefore undesirably increases the number of bits per codeword to be stored, thereby also increasing the memory requirements for the stored data.
FIG. 3 illustrates an example of a 3-bit variable length single codebook coding method. As shown in FIG. 3, a 3-bit variable single codebook has a coding range maximum of 4, encoding data from 0 to 4. The codeword format increases from a 2 bit per codeword representation to a 3 bit per codeword representation in one codebook. The variable length codebook allows coding of offset values from 0 to 2 with only two bits, rather than the 3 bits per codeword of a 3-bit fixed codebook, while still allowing coding of data value 3 and 4 in the same codebook when needed. Thus, a variable length codebook provides efficient coding for a variable with recurring smaller values, while increasing the upper maximum range of the codeword representation to allow encoding of a larger value within the upper maximum range when that value occurs. However, like the single fixed length variable codebook approach, a single variable length codebook approach is also inflexible as it would require that the codebook selected at the outset of data compression to be sufficiently large to accommodate the maximum possible offset value corresponding to the growing maximum length of history buffer 11 over time.
In the multiple codebooks phase-in coding method, a set of codebooks of variable lengths is provided. As the coding range maximum of offset 15 increases past the maximum coding range of a particular codebook in that set of provided codebooks, the next larger codebook is then automatically selected, or "phased-in," to encode the next offset value. This approach ensures that if a large offset value occurs, then the newly selected codebook will be able to handle that larger offset value.
The traditional multiple codebooks phase-in coding method is however still not very efficient, since offset 15 is often still a small value even if its maximum possible value may have increased with the increasing length of the history buffer over time. Thus, automatically phasing-in the next larger codebook in response to the current possible maximum value of offset 15, unnecessarily increases the length of the codeword, when the actual value of offset 15 can still be represented by the prior smaller codebook used. Since a significant mount of data is continuously transmitted and stored by the computer during its operation, the effect of reducing a data representation even by one bit per codeword is significant when this data bit reduction is multiplied by the mount of data to be coded to produce a significant, highly sought after, memory saving result. It is therefore desirable to provide an efficient data coding system which minimizes the number of bits required to represent a random variable, such as offset 15 that has an upper maximum that increases over time.