Having an efficient data compression system is increasingly significant as electronics manufacturers compete with each other for compactness and improved performance in their electronic products. In particular, an increasing market demand for a variety of portable electronic products has resulted in requiring a substantial reduction to the system real estate available for electronic data storage and data manipulation in the designs of these products. Thus, with less electronic memory available, having an efficient data compression method is even more critical in the designs of portable electronics, if these devices are to achieve the comparable operation of a larger electronic system.
A variety of data compression techniques are known. The performances of each of these various data compression techniques are measured by the compression ratio, which is the length of an uncompressed input data stream to the length of its corresponding compressed data stream following data compression. The compression ratio for each data compression technique, however, also varies depending on the data type of the input data stream. Some data compression techniques have a higher compression ratio for ASCII type input data than for binary data type, while other data compression techniques result in a lower compression ratio for ASCII data type and a higher ratio for binary data type. Thus, for each data type, one or more data compression techniques can be identified which will provide an optimal data compression ratio according to that data type, while other data compression techniques producing a lower compression ratio for that particular data type should be avoided.
A variety of data types are known and used by the industry to encode characters, punctuation marks, and other symbols found in texts and communication protocols. Known data types include ASCII standard format, binary standard format, and unicode standard format. Although ASCII standard comprises a set of 8-bit binary numbers, only 7 of these bits are typically used to represent an actual data symbol, while binary standard format encodes one data symbol in 8 bits. Unicode represents each data symbol with two bytes, or a set of 16-bit binary numbers. The first byte, or the first 8-bit prefix, indicates a data characteristic information of the 16-bit data symbol. For example, the first byte might indicate that the 16-bit data symbol is a Kanji character.
However, despite the variety of data types that are commonly used in the industry, prior art data manipulation processes do not include automatic detection of the data type of an input data stream. Most prior art data manipulation processes rely on the user or another source external to the data manipulation process itself to supply such data type information. For example, in a file transfer program ("FTP"), the FTP process queries the user to supply the data type information of the input data stream. Other prior art data manipulation processes include requiring a user to set a data type mode bit, or to assume a particular data type of the input data stream. Assuming a particular data type is an inefficient method of manipulating data. If an electronic data manipulation process always assumes the data type to be 8 bits, when in reality the input data type comprise 7 bits, the data type assumption by the process then results in a substantial waste of system memory to reserve an additional bit for each data symbol in the input data stream. Thus, it would be desirable to provide a method to automatically detect the data type of an input data stream.
Additionally, typical prior art data compression techniques are classified either as a statistical or a dictionary type of data compression method. A statistical type of data compression is based on single symbol coding. Single symbol coding is accomplished by assigning to each possible data symbol in the input data stream a probability for the appearance of that symbol. Examples of this type of data compression method are the Huffman code method and the widely published variations of this code. With the Huffman coding method, a symbol having a greater probability of appearance is encoded with a short binary string, while a symbol having a lower probability of appearance in the input data stream is encoded with a longer binary string.
A dictionary type data compression method associates groups of consecutive characters, as in phrases, to a dictionary of indices. The dictionary type data compression methods are also commonly referred to as a "codebook" or a "macro coding" approach. The various coding schemes in the Ziv-Lempel ("LZ") family of data compression techniques are all examples of the dictionary type of data coding method. In the LZ family of data compression methods, a typical LZ-type compression method processes an input data stream by checking first if each current data string encountered in the input data stream matches a data string already stored in the output data buffer. If no match of the current data string to previously stored data strings is detected, the current data string is stored into the output buffer. If, however, a match is detected between the current data string and a data string already stored in a memory location of the output data buffer, a pointer indicating that memory location is stored into the output buffer instead of the data string.
Shown in FIGS. 1 and 2 are two examples of LZ data compression methods. The LZ-1 compression method shown in FIG. 1 processes an uncompressed input data stream 10 to generate a compressed data output stream 20 by comparing an uncompressed portion 13 of input data stream 10 to data in a history buffer 11 of already processed input data. If a matching data string 12 is located in history buffer 11 for current data string 14, data string 14 is encoded in compressed data stream 20 as a pointer (p.sub.o, l.sub.o) 24, corresponding to an offset p.sub.o 15 and a data length l.sub.o 16. The shorter length data of pointer (p.sub.o, l.sub.o) 24 thus replaces longer data string 14 in output compressed data stream 20.
History buffer 11 is considered to comprise no data at the time prior to data compression of input data stream 10. As the compression process progresses, history buffer 11 expands within a given system memory reserve according to how much of input data stream 10 has been processed until history buffer 11 reaches the maximum system memory allocation available for data compression. Thus, in the case where no matching string is found, as in the case for data string 12 during the initial data compression stage of input data stream 10, unmatched string 12 is stored into output data stream 20 in the form of a literal length header (LL.sub.o) 22 followed by data string 12 duplicated from original data stream 10. Literal length header 22 encodes the number of characters, n, in unmatched string 12 that follows literal length header 22. This encoded information is recovered during data decompression to notify the decompression process of the number of data characters following literal length header 22, corresponding to the original input data that need not be expanded.
The LZ-2 data compression method of FIG. 2 searches for matching current data string 14 in a dictionary 30 of indices. Dictionary 30 comprises a limited buffer length and data strings from input data stream 10. If a matching data string 12 is located in dictionary 30 for current data string 14, current data string 14 is then encoded in the output data stream with index 32 corresponding to the location of data string 12 in dictionary 30. Because the LZ-1 method of FIG. 1 searches for a matching data string character by character through the history buffer, the time required to compress input data stream 10 is substantially greater when using the LZ-1 method of FIG. 1 than with the LZ-2 method of FIG. 2. However, the LZ-1 method provides a greater data compression ratio than the LZ-2 method.
Data decompression is the conversion of a stream of compressed data back to its original expanded form. Decompression is typically accomplished with a lookup table, if the data was compressed using a statistical or a Huffman type coding scheme. If the data was compressed using a dictionary type data compression method, such as the LZ-1 method (as explained above with reference to FIG. 1), original data stream 10 is reconstructed by replacing each pointer (p, 1) encountered in compressed data stream 20 with the data string in the history buffer located at offset p. If the data was compressed with an LZ-2 data compression scheme (as explained above with reference to FIG. 2), the dictionary generated during data compression is used to retrieve the indexed data strings.
FIG. 3 illustrates a typical prior art data compression system. Data compression system 40 receives an input uncompressed data stream 10 and processes data stream 10 through a first data compression phase 42 using a first predefined data compression technique. Alternatively, prior art data compression system 40 may also provide a second data compression phase 44 using a second data compression technique also predefined by the design of data compression system 40. Prior art data compression systems thus use the same data compression techniques incorporated by the data compression system design regardless of the data type encountered in the input data stream. Because each data compression technique typically provides a different compression ratio for different data types, prior art compression systems are unable to maximize the data compression ratio when encountering a variety of input data types in the input data stream. There is therefore a need to provide an efficient and flexible data compression system that maximizes the data compression ratios according to the input data type detected. Moreover, prior art data compression systems also do not maximize the usage of the CPU, such as to provide normal rate of data compression during the CPU's idle time, but increasing the rate of data compression when the CPU is preparing to process another task. It is therefore also desirable to have a data compression system that provides controlling means to increase or decrease the system's rate of data compression.