1. Field of the Invention
The present invention generally relates to the field of compressed main memory architecture in computer systems, and more specifically to an improved method and apparatus for managing a compressed main memory or associated uncompressed cache.
2. Description of the Related Art
Computer main memory systems are now generally available that employ high speed compression/decompression hardware in the data flow paths between the main memory and the processors. Processor access to main memory within these systems is performed indirectly through the compressor and decompressor apparatuses, both of which add significantly to the processor access latency overhead, but facilitate significantly lower storage expense.
Large cache memories are implemented between the processor and the compressor and decompressor hardware to reduce the frequency of processor references to the compressed memory, mitigating the effects the high compression/decompression latency. These caches contain uncompressed data and are generally partitioned into cache lines which are equal in size to the fixed data block size required by the compressor and decompressor. When a processor requests data that is not already located in the cache, the line which contains the requested data is located in the compressed memory, then read from the compressed memory, then decompressed and placed in the uncompressed cache. When no empty cache line is available, an existing cache line is selected for replacement, so that the existing cache line is removed from the cache, compressed and stored in the compressed memory, and replaced with the new cache line. Subsequent processor references in the locality of the initial reference and within the cache line are serviced directly from the uncompressed cache data, avoiding the latency associated with decompression. Three methods of uncompressed data caching are described.
A conventional independent cache array and associated directory provides the greatest performance, but at the highest cost. The performance is maximized as the cache and directory hardware can be optimized for lowest latency access by the processor and the main memory interface traffic is segregated from that of the cache interface. However, costs are associated with the cache array, directory, and associated hardware interfaces.
Hovis, et al. (U.S. Pat. No. 5,812,817 incorporated herein by reference) logically apportion an uncompressed cache memory region within the main memory. The cache controller and the memory controller share the same storage array via the same physical interface. Data is shuttled back and forth between the compressed main memory region and the uncompressed cache through the compression hardware during cache line replacement. Advantages for this scheme are that the compressed cache size can be readily optimized to specific system applications, and costs associated with an independent cache memory, directory and associated interfaces are eliminated. Performance is particularly disadvantaged by contention for the main memory physical interface by the latency sensitive cache controller.
Benveniste, et al. (U.S. Pat. No. 6,349,372 B1 incorporated herein by reference) describe a xe2x80x9cvirtual uncompressed cachexe2x80x9d that consists of a predetermined number of uncompressed data blocks that are allowed to be stored in the uncompressed format within the compressed memory, in that same manner that an incompressible data block would be stored. No separate cache directory is needed, as all processor data references are located from the compressed memory directory. A FIFO list of uncompressed data blocks maintained, and when a new data block is uncompressed, it displaces a data block from the list. Data is shuttled out and in the compressed main memory through the compression/decompression hardware during data block replacement in the uncompressed list. This scheme is very low in cost, as no special cache memory or directory exists. However, performance is disadvantage by compressor/decompressor dataflow contention with processor data and directory references.
All of these schemes involve maintaining all the compressed memory content in the compressed format (when practical), while allowing a fixed quantity or cache of data in the uncompressed format. Data is only uncompressed (on demand) when a processor requests data that is not found in the uncompressed data set or cache. While this maximizes the available space in the main memory, the space is often not used. None of these schemes provide a means to compress/decompress data before the access to the data is requested. Therefore, a compressor/decompressor data traffic block is induced at the memory at the same time that the processor needs access to the memory, resulting in contention. Moreover, write traffic cannot be carried out until all the necessary data is available to the compressor, and the compressor has processed the data, (a period of hundreds of cycles), such that memory write stalls are common. All of these schemes also set aside large amounts of unused memory and rely on special software to mitigate a problem known as memory pressure. This memory pressure problem occurs when more memory is needed to store data in the compressed memory than is available at the time, due to poor overall compression rate and high real memory utilization.
It would be desirable to be able to maximize the space in the main memory to store data as opposed to having it unused. It would also be desirable to store as much uncompressed data in the main memory as practical, to reduce reference latency by avoiding the compression/decompression operations. It would also be desirable to retain in memory associated with each data block, the degree of compressibility for the data block after incurring the overhead of a compression operation, to mitigate needless compression activity on poorly compressible data blocks later on. Moreover, it would be desirable to be able to perform decompression/compression activity before a processor needs data, and when the main memory is otherwise not being used, to mitigate contention conditions with the processor access to the memory. Lastly, it would be desirable for the compressed memory manager hardware to instantly respond to memory pressure conditions, thereby reducing the size of wasteful memory reserves.
It is an object of the invention to provide a data management method, within a compressed memory system, to maximize the amount of the compressed main memory that is utilized for storing data in the uncompressed format to mitigate conditions where data access must incur compression and expansion latency penalties.
It is a further object of the invention to provide a method and apparatus to regulate the overall compressed main memory compression ratio by detecting when the amount of available memory is outside predetermined thresholds, and responsively begin selecting data blocks for compression (to add to the available memory,) or decompression (to use surplus available memory for uncompressed data,) while the memory system is not busy, or as a priority independent of busy conditions, until the amount of available memory is within predetermined thresholds.
The invention comprises a computer system having a memory having sectors of data blocks including compressed data blocks and uncompressed data blocks. A sector counter, operatively connected to the memory is adapted to maintain a used memory sector count of the memory. A compressed memory manager is operatively connected to the memory. The invention also has a compress memory threshold register operatively connected to the compressed memory manager that contains a compress memory threshold. A sector translation table is operatively connected to the memory and contains a touch bit indicating when the data block was last accessed. An expand memory threshold register is operatively connected to the compressed memory manager and contains an expand memory threshold. The compressed memory manager is adapted to compress data blocks in the memory when the used memory sector count is above the compress memory threshold. Less recently accessed data blocks are compressed before more recently accessed data blocks, based on the touch bit. The compressed memory manager is further adapted to decompress the data blocks when the used memory sector count is below the expand memory threshold.
The invention also has a memory controller operatively connected to the memory, the compressing and the decompressing are performed by the compressed memory manager only when the memory controller is not performing memory access requests. A priority compress memory threshold register is operatively connected to the compressed memory manager and contains a priority compress memory threshold. The compressed memory manager can be further adapted to compress the data blocks in the memory when the used memory sector count is above the priority compress memory threshold, irrespective of a memory access request activity of the memory controller. The sector translation table contains compression attributes of the data blocks including a zero attribute indicating a data block of all zeros. The compressed memory manager is further adapted to store data blocks having the zero attribute as a zero entry in the memory, wherein the zero entry avoids using memory sectors. The compressed memory manager is further adapted to compress data blocks having a higher compressibility attribute before compressing data blocks having a lower compressibility attribute.
The invention uses a method of utilizing a sectored compressed memory in a computer system. The method comprises monitoring a used memory sector count of the memory, compressing data blocks in the memory when the used memory sector count is above a compress memory threshold, and decompressing the data blocks when the used memory sector count is below an expand memory threshold. The invention compresses data blocks accessed less recently before compressing data blocks accessed more recently. The compressing and the decompressing are performed when the compressed memory manager in the computer system is not performing memory access requests. In addition, the invention compresses the data blocks in the memory when the used memory sector count is below a priority compress memory threshold, irrespective of a memory access request activity of the compressed memory manager. The invention always stores data blocks having all zeros as a zero entry in the memory, the zero entry avoids using memory sectors. The invention compresses data blocks having a higher compressibility before compressing data blocks having a lower compressibility. The decompressing only avoids decompressing uncompressed data blocks. The invention also avoids compressing data blocks held in a cache of the computer system.
According to the invention, a means to detect zero data exists in the data path to the compressor write buffer, for the purpose of indicating when the data in a given transfer cycle is all zero. When any transfer cycle for a given data block is not zero, then the data block is a non-zero data block. All non-zero data blocks are stored in the uncompressed format (bypassing the compressor,) when sufficient free memory exists. Otherwise data is compressed and stored as compressed when a spatial benefit exists. A 3-bit Degree of Compressibility (DOC) attribute is derived and saved in the compressed memory directory entry for every data block. When the amount of available memory is outside of the threshold bounds, a compressed memory manager is enabled to begin selecting the most compressible data blocks for releasing memory or compressed data blocks for expanding memory (through compression or decompression) while it scans the compressed memory directory DOC fields. This process continues until the available memory returns to within the predetermined threshold bounds.
When the invention is applied to systems with an uncompressed cache, only data blocks that are not cached are candidates for re-compression. The invention also rids stale data from the compressed main memory. Otherwise when an uncompressed cache is not employed, a small (typically 32, 64, or 128 entry) buffer of the most recent data block addresses that have been read from the main memory is maintained in hardware, and only data blocks not located in the buffer are candidates for recompression.