1. Field of the Invention
The present invention generally relates to novel cache configurations for compressed memory systems.
2. Description of Related Art
A compressed memory system is a system in which main memory is divided into a number of logically fixed size segments (e.g., units of compression or lines). Each such logical segment is physically preferably stored in a compressed format. A segment is stored in an uncompressed format if it cannot be compressed. One way of implementing such systems is to make use of a cache between main memory and higher-level caches; to decompress lines on cache misses; and to compress lines on writebacks of modified cache lines. FIG. 1 illustrates a high-level system architecture for a compressed memory system of this type: processors, together with level 1 (L1) and level 2 (L2) caches (110, 120) share a large L3 cache (130) in which data is stored in uncompressed format. On a cache writeback, the data is compressed by a compressor (140) before being stored in main memory (160); conversely, on a cache miss, data is decompressed by a decompressor (150) as it is read from main memory (160).
An issue in such systems is that the compression of the data stored in the compressed memory system can vary dynamically. If the amount of free space available in the compressed memory becomes sufficiently low, there is a possibility that a writeback of a modified cache line could fail. To prevent this, interrupts may be generated when the amount of free space decreases below certain thresholds, with the interrupts causing OS (operating system) intervention so as to prevent this from occurring.
If the line size of the cache in a compressed memory system is smaller than the compressed memory line size (i.e., size of a unit of compression), the amount of free space in the compressed memory system required to guarantee that all modified cache lines can be written back could be unacceptably large. An example follows.
Suppose the unit of compression is 1024 bytes, that the cache line size is 64 bytes, and that the cache holds M lines. The worst case loss of compression that could result from a store of a modified 64 byte line depends on details of the compressed memory system compression and storage allocation designs; an upper bound on the loss of compression is that a compressed memory line could become incompressible, and require 1024 bytes; thus in general it may be necessary to reserve 1024 bytes of free space in the compressed memory system for each modified 64 byte cache line. Furthermore, in general the number of modified cache lines may not be known, and the number of distinct compressed memory lines for which there is one or more cache lines residing in the cache also may not be known; in the worst case all cache lines may be modified and reside in distinct compressed memory lines. Since the ratio of cache line size to compressed memory line size is 16 (16×64=1024), this means that to handle this worst case an upper bound on the amount of free space that must be reserved in the compressed memory is 16×64×M=1024×M bytes. Such a requirement can significantly reduce the overall compression (that is the compression taking into account the free space together with compressed memory system storage fragmentation and directory overheads). It is, therefore, an object of this invention to reduce the amount of free space required to guarantee that all modified cache lines can be written to a compressed main memory system.
There are related problems associated with the design of NUMA (non-uniform-memory-access) architectures. In such systems, there are a number of nodes, where each node has processors, a cache hierarchy, and main memory. For convenience, only the cache immediately above main memory in each such node is considered. A global real memory space is provided in such architectures by means of addressing schemes in which any node may address the real memory of any other node by means of inter-cache transfers. FIG. 2 illustrates a high-level system architecture of this type. The figure is adapted from the book by Lenoski and Weber, Scalable Shared-Memory Multiprocessing, FIG. 3-2, “Nonuniform Memory Access (NUMA) Architecture”, page 91, and in which further descriptions of NUMA architectures and references to the extensive prior art can be found. As shown in FIG. 2, in a typical NUMA system there are a number of nodes, where each node consists of a processor (210 in the first node, 220 in the second node, and 230 in the last node), a cache (240 in the first node, 250 in the second node, and 260 in the last node), and a memory local to that node (270 in the first node, 280 in the second node, and 290 in the last node). Inter-cache transfers, which enable access from one node to a non-local memory of a different node, take place by means of an interconnection network (295). If the local memories in such systems are implemented using compressed memory architectures, situations may arise in which the cache in a given node contains not only uncompressed sections of compressed memory lines from the given node, but uncompressed sections from compressed memory lines from any other node in the NUMA system. This significantly complicates the problem of guaranteeing forward progress, since in analogy with the above worst case analysis, an additional factor equal to the number of nodes in the NUMA system must be taken into account for required free space calculations. That is, if there are N nodes in the NUMA system, an upper bound on the amount of free space that must be reserved in the compressed memory is 16×64×M×N=1024×M×N bytes, that is N times more than the above example. Furthermore, OS handling of a compressed memory low free space condition on one node could cause writebacks of modified remote lines, which could cause a low free space condition on a remote node; that is a “chain reaction” in low free space conditions is possible. It is, therefore, desirable to de-couple low free space condition handling on each node, in a manner which complements the cache operation constraints for those cases in which the OS is handling a compressed memory low free space condition.