1. Field of the Invention
This invention relates generally to the field of compressed memory architecture in computer systems, and more specifically to an improved method and apparatus for managing compressed main memory.
2. Discussion of the Prior Art
Computer systems generally consist of one or more processors that execute program instructions stored within a memory medium. This medium is most often constructed of the lowest cost per bit, yet slowest storage technology. To increase the processor performance, a higher speed, yet smaller and more costly memory, known as a cache memory, is placed between the processor and final storage to provide temporary storage of recent/and or frequently referenced information. As the difference between processor speed and access time of the final storage increases, more levels of cache memory are provided, each level backing the previous level to form a storage hierarchy. Each level of the cache is managed to maintain the information most useful to the processor. Often more than one cache memory will be employed at the same hierarchy level, for example when an independent cache is employed for each processor. Cache memory systems in computing devices have evolved into quite varied and sophisticated structures, but always they address the tradeoff between speed and both cost and complexity, while functioning to make the most useful information available to a processor as efficiently as possible. Typically only large xe2x80x9cmainframexe2x80x9d computers employ memory hierarchies greater than three levels. However, systems are now being created using commodity microprocessors that benefit greatly from a third level of cache in the memory hierarchy. This level is best suited between the processor bus and the main memory, and being shared by all processors and in some cases the I/O system too, it is called a shared cache. Each level of memory requires several times more storage than the level it backs to be performance effective, therefore the shared cache requires several tens of megabytes of memory. To remain cost effective, the shared cache is implemented using low cost Dynamic Random Access Memory (DRAM), organized as a separate array or a portion of the system main memory.
Recently, cost reduced computer system architectures have been developed that more than double the effective size of the main memory by employing high speed compression/decompression hardware based on common compression algorithms, in the path of information flow to and from the main memory. Processor access to main memory within these systems is performed indirectly through the compressor and decompressor apparatuses, both of which add significantly to the processor access latency costs.
Referring now to FIG. 1, a block diagram of a prior art computer system 100 is shown. The computer system includes one or more processors 101 connected to a common shared memory controller 102 that provides access to a system main memory 103 through a shared cache 114. The shared memory controller contains a compressor device 104 for compressing fixed size information blocks into as small a unit as possible for ultimate storage into the main memory 103, a decompressor device 105 for reversing the compression operation after the stored information is later retrieved from the main memory, and a cache controller 115 for managing a cache memory to contain uncompressed information. The cache controller 115 is connected to the memory controller 106 through at least a read request 119 and read request address 120 to signal the memory controller to read a quantity of information from the main memory for placement in to the cache 114 via bus 117. Information may be transferred to the processor data bus 108 from the cache 114 through bus 117, or from the main memory 103, either through or around the decompressor 105 via a multiplexor 111. Similarly, information may be transferred to the cache from the main memory 103 from the processor data bus 108. Information may be transferred to the main memory 103 from the processor data bus 108 or cache 114, either through or around the compressor 104 via a multiplexor 112. The processor data bus 108 is used for transporting uncompressed information between other processors and/or the shared memory controller 102, and the shared cache 114.
The main memory 103 is typically constructed of dynamic random access memory (DRAM) with access controlled by a memory controller 106. Addresses appearing on the processor address bus 107 and cache address bus 116 are known as Real Addresses, and are understood and known to the programming environment. Addresses appearing on the main memory address bus 109 are known as Physical Addresses, and are used and relevant only between the memory controller and main memory DRAM. Memory Management Unit (MMU) hardware within the memory controller 106 is used to translate the real processor addresses to the virtual physical address space. This translation provides a means to allocate the physical memory in small increments for the purpose of efficiently storing and retrieving compressed and hence, variable size information.
The compressor 104 operates on a fixed size block of information, say 1024 bytes, by locating and replacing repeated byte strings within the block with a pointer to the first instance of a given string, and encoding the result according to a protocol. This process occurs through a byte-wise compare over a fixed length and is paced by a sequence counter, resulting in a constant completion time. The post process output block ranges from just a few bytes to the original block size, when the compressor could not sufficiently reduce the starting block size to warrant compressing at all. The decompressor 105 functions by reversing the compressor operation by decoding resultant compressor output block to reconstruct the original information block by inserting byte strings back into the block at the position indicated by the noted pointers. Even in the very best circumstances, the compressor is generally capable of only xc2xc-xc2xd the data rate bandwidth of the surrounding system. The compression and decompression processes are naturally linear and serial too, implying quite lengthy memory access latencies through the hardware.
FIG. 2 depicts a prior art main memory partitioning scheme 200.
The main memory 205 is a logical entity because it includes the processor(s) information as well as all the required data structures necessary to access the information. The logical main memory 205 is physically partitioned from the physical memory address space 206. In many cases the main memory partition 205 is smaller than the available physical memory to provide a separate region to serve as a cache with either an integral directory, or one that is implemented externally 212. It should be noted that when implemented, the cache storage may be implemented as a region 201 of the physical memory 206, a managed quantity of uncompressed sectors, or as a separate storage array 114. In any case, when implemented, the cache controller requests accesses to the main memory in a similar manner as a processor would if the cache were not present.
The logical main memory 205 is partitioned into the sector translation table 202, with the remaining memory being allocated to sector storage 203 which may contain compressed or uncompressed information, free sector pointers, or any other information as long as it is organized into sectors. The sector translation table region size varies in proportion to the real address space size which is defined by a programmable register within the system. Particularly, equation 1) governs the translation of the sector translation table region size as follows:                               sector_translation          ⁢          _table          ⁢          _size                =                                                            real_memory                ⁢                _size                                            compression_block                ⁢                _size                                      ·            translation_table                    ⁢          _entry          ⁢          _size                                    1)            
Each entry is directly mapped to a fixed address range in the processor""s real address space, the request address being governed in accordance with equation 2) as follows:                               STT_entry          ⁢          _address                =                              (                                                            (                                      real_address                                          compression_block                      ⁢                      _size                                                        )                                ·                translation_table                            ⁢              _entry              ⁢              _size                        )                    +          offset_size                                    2)            
For example, a mapping may employ a 16 byte translation table entry to relocate a 1024 byte real addressed compression block, allocated as a quantity 256 byte sectors, each located at the physical memory address indicated by a 25-bit pointer stored within the table entry. The entry also contains attribute bits 208 that indicate the number of sector pointers that are valid, size, and possibly other information. Every real address reference to the main memory causes the memory controller to reference the translation table entry 207 corresponding to the real address block containing the request address 210. For read requests, the MMU decodes the attribute bits 208, extracts the valid pointer(s) 209 and requests the memory controller to read the information located at the indicated sectors 204 from the main memory sectored region 203. Similarly, write requests result in the MMU and memory controller performing the same actions, except information is written to the main memory. However, if a write request requires more sectors than are already valid in the translation table entry, then additional sectors need to be assigned to the table entry before the write may commence. Sectors are generally allocated from a list of unused sectors that is dynamically maintained as a stack or linked list of pointers stored in unused sectors. There are many possible variations on this translation scheme, but all involve a region of main memory mapped as a sector translation table and a region of memory mapped as sectors. Storage of these data structures in the DRAM based main memory provides the highest performance at the lowest cost, as well as ease of reverting the memory system into a typical direct mapped memory without compression and translation.
Referring back to FIG. 1, the large high speed cache memory 114 is generally employed between the processor and the compressor/decompressor hardware to reduce the frequency of processor references to the compressed memory, mitigating the effects the high compression/decompression latency. The cache is partitioned into a number of cache lines, equal in size to the fixed information block size required by the compressor and decompressor. Each cache line contains an uncompressed copy of an equivalent block of information contained within the compressed main memory. Since the processor can only reference the cache contents, the duplicated information in the main memory represents a cost to the system, in terms of wasted space. As long as the contents of the cache line remain unmodified, this cost is balanced by the advantage of not having to copy the cache line contents back to the main memory, after a cache line is evicted from the cache. However, this advantage is lost when the cache line becomes modified with respect to the copy of information within the compressed main memory, as all modified cache lines must be written back to the main memory through the compressor. This implies that once a cache line becomes modified, the copy of information within the main memory becomes xe2x80x9cstalexe2x80x9d and wastes space.
It would thus be highly desirable to provide a data management technique for a compressed memory system that detects xe2x80x9cstalexe2x80x9d information in the main memory, and returns the space used to store such information to an unused sector pool to be used for storing other information.
It would further be highly desirable to provide a data management technique that improves the overall compression rate of the system, without significant cost or complexity, that reduces the likelihood of encountering a xe2x80x9cmemory pressurexe2x80x9d situation where the system runs low on free sectors.
It is an object of the invention to provide a data management mechanism, within a compressed memory system operating with an uncompressed information cache, to maximize the compression efficiency and thus mitigate xe2x80x9cmemory pressurexe2x80x9d situations where the system runs low on free sectors.
It is a further object of the invention to provide a method and apparatus to detect and recover the main memory space used to store xe2x80x9cstalexe2x80x9d information associated with cache lines in the xe2x80x9cmodifiedxe2x80x9d state, and return the storage to an unused pool for use in storing other information.
According to the principles of the invention, there is provided a computer memory system implementing a processing device for enabling indirect storage and retrieval of compressed data in an available address space in a physical memory associated with the computer and issuing real memory addresses for accessing information from the physical memory, the system comprising:
a sectored storage region in said physical memory for exclusive storage of information in fixed length storage sectors;
a cache memory array device having a plurality of cache lines;
a cache directory device, associated with the cache memory array device, comprising entries for storing address tag information and modification state information associated with the data stored in the cache memory array;
a cache line replacement mechanism associated with the cache memory array, for performing a cache line fill operation by requesting and removing existing cache line data and replacing removed data with different data via a cache line replacement operation, and updating the directory device with new address tag information and modification state information; and,
a cache memory access system for enabling access to the data in said physical memory by said processing device, the cache memory access system including a cache control mechanism for asserting signals associated with the cache line modification state information when a cache line is to be modified; and,
a memory control device, responsive to the asserted modified state information signals for reallocating any sectors within the sectored storage region associated with the modified cache line as unused sectors available for subsequent data storage.
Particularly, the cache control mechanism device that has been modified to assert two independent control signals: 1) a xe2x80x9cread-with-intent-to-modifyxe2x80x9d (RWITM); and, 2) xe2x80x9cmodifyxe2x80x9d, to the main memory control device, for the purpose of indicating that a line in the cache is being set to the modified state. Both control signals cause a memory controller device to set the main memory storage sector requirement, noted within a sector translation table entry selected by cache line read request address signals, to zero, and release all storage sectors to a free storage sector pool.