1. Field of the Invention
This invention relates generally to data compression systems, and more specifically to an improved method and apparatus for coding and parsing compressed data for the purpose of avoiding system bottlenecks that prevent optimum throughput.
2. Discussion of the Prior Art
Data compression has become increasingly vital in today's computer systems due to the high demand for data transmission and storage capacity. In particular, main memory compression is now both feasible and desirable with the advent of parallel compression using a cooperative dictionary, as described in commonly-owned U.S. Pat. No. 5,729,228 to Franaszek et al. entitled PARALLEL COMPRESSION AND DECOMPRESSION USING A COOPERATIVE DICTIONARY, incorporated herein by reference. Parallel compression is a relatively new art in the field of compression. Its main concept is to divide a block of uncompressed data into multiple sectors and then assign them to individual engines for both compression and decompression with all engines sharing a cooperative dictionary such that the compression ratio is close to that of a single-engine design. This results in much better latency and throughput than the previous single-engine designs, thus making main memory compression feasible.
Nevertheless, significant improvements are still needed, particularly in the decompression process, in order to keep pace with the rapid acceleration in today's processor speed. In particular, a processor cannot tolerate high latency or low throughput while accessing data from the main memory through the decompressor. In the past, main memory decompression has often been limited in throughput performance primarily due to the critical timing paths within its decompressor's parser. The main function of the decompressor parser is to extract consecutive data phrases from the incoming compressed data stream. These phrases comprise a certain predetermined combinations of raw characters and variable-length strings. They will eventually be decoded into uncompressed data bytes in the latter stages of the decompressor. The parser must be able to parse phrases quickly so as to sustain the decompression engine pipeline. Specifically, referring to FIG. 2, within each clock cycle, the parser utilizes an address pointer to extract a new phrase from the parser data input register, determines its type and bit length, and then calculates the address pointer for the next phrase. This process is quite cumbersome and usually results in critical paths running through multiple logic levels within the barrel shifters, adders, encoders and multiplexers. As a result, it limits the highest decompression clock rate for a given technology and compression algorithm.
It would thus be highly desirable to provide an enhanced method and apparatus which will improve the latency and throughput of the decompressor by simplifying the compression algorithm and its parsing mechanism, without sacrificing the overall compression ratio.
Moreover, it is the case that the entire decompression process is controlled by a state machine having a certain number of states. These states transition from one to another in order to initiate or terminate various steps within the decompression pipeline. They keep all decompression engines in a parallel configuration running synchronously to one another. Once the decompression process is initiated, any stall originated from the decompressor's input interface, any particular internal engine, or its output interface will also stall the entire pipeline for all engines. Thus, any stall downstream to the parser will also immediately stop the parser from parsing. This would degrade the overall decompressor's throughput performance. For example, if a cache controller is not ready to receive additional decompressed data, it will stop requesting for data. This will in turn stall the entire decompressor's pipeline.
It would thus be additionally desirable to provide a method and apparatus which will improve the latency and throughput of the decompressor by isolating the operation of the parser stage in such a manner that a subsequent downstream stall will not stall the parser operation.