The present invention relates to the field of cache design for high performance processor integrated circuits. In particular, the invention relates to apparatus for compressing data in a large, upper level, on-chip, cache.
Cache memories are high speed memory systems that store a partial copy of the contents of a larger, slower, memory system. The partial copy stored in a cache normally contains those portions of the contents of the larger memory system that have been recently accessed by a processor. Cache memory offers advantage in that many programs access the same or nearby code and data locations repeatedly; execution of instructions is statistically more likely to access recently accessed locations or locations near recently accessed locations than other locations in memory.
Many modern computer system implement a hierarchy of cache memory systems for caching memory data in main memory. Main memory of these systems typically consists of Dynamic Random Access Memory (DRAM). Many common processors, including Intel Pentium-II and Pentium-III circuits, have two levels of cache. There also exist computing systems with three and four levels of cache.
In addition to storage, cache memory systems also have apparatus for identifying those portions of the larger, slower, memory system held in cache, this often takes the form of a cache tag memory.
Cache systems typically have cache tag memory subsystems and cache data memory subsystems. Each cache data memory typically operates on units of data of a predetermined size, known as a cache line. The size of a cache line can be different for each level in a multilevel cache. Cache lines are typically larger than the word or byte size used by the processor and may therefore contain data near recently used locations as well as recently used locations.
In typical cache memory systems, when a memory location at a particular main-memory address is to be read, a cache-line address is derived from part of the main-memory address. A portion of the cache-line address is typically presented to the cache tag memory and to the cache data memory; and a read operation done on both memories.
Cache tag memory typically contains one or more address tag fields. Multiple address tag fields can be, and often are, provided to support multiple xe2x80x9cwaysxe2x80x9d of associativity in the cache. Each address tag field is compared to the remaining bits of the cache-line address to determine whether any part of data read from the cache data memory corresponds to data at the desired main-memory address. If the tag indicates that the desired data is in the cache data memory, that data is presented to the processor and next lower-level cache; if not, then the read operation is passed up to the next higher-level cache. If there is no higher-level cache, the read operation is passed to main memory. N-way, set-associative, caches perform N such comparisons of address tag fields to portions of desired data address simultaneously.
Typically, a tag memory contains status information as well as data information. This status information may include xe2x80x9cwrittenxe2x80x9d flags that indicate whether information in the cache has been written to but not yet updated in higher-level memory, and xe2x80x9cvalidxe2x80x9d flags indicating that information in the cache is valid.
A cache xe2x80x9chitxe2x80x9d occurs whenever a memory access to the cache occurs and the cache system finds, through inspecting its tag memory, that the requested data is present and valid in the cache. A cache xe2x80x9cmissxe2x80x9d occurs whenever a memory access to the cache occurs and the cache system finds, through inspecting its tag memory, that the requested data is not present and valid in the cache.
When a cache xe2x80x9cmissxe2x80x9d occurs in a low level cache of a typical multilevel cache system, the main-memory address is passed up to the next level of cache, where it is checked in the higher-level cache tag memory in order to determine if there is a xe2x80x9chitxe2x80x9d or a xe2x80x9cmissxe2x80x9d at that higher level. When a cache xe2x80x9cmissxe2x80x9d occurs at the top level cache, the reference is typically passed to main memory.
Typically, the number of xe2x80x9cwaysxe2x80x9d of associativity in a set-associative cache tag subsystem is the number of sets of address tags in each line of tag memory, and corresponding sets of comparators. The number of ways of storage is the number of cache lines, or superlines, that can be stored and independently referenced through a single line of cache tag memory. In most caches, the number of ways of associativity is the same as the number of ways of storage. Cache superlines are combinations of multiple cache lines that can be referenced though a single address tag in a line of tag memory.
Writethrough caches are those in which a write operation to data stored in the cache results in an immediate update of data in a higher level of cache or in main memory. Writeback caches are those in which a write operation to data stored in the cache writes data in the cache, but update of data in higher levels of cache or in main memory is delayed. Operation of cache in writeback and writethrough modes is known in the art.
Whenever a cache xe2x80x9cmissxe2x80x9d occurs at any level of the cache, data fetched from a higher level of cache or main memory is typically stored in the cache""s data memory and tag memory is updated to reflect that data is now present. Typically also, other data may have to be evicted to make room for the newly fetched data.
A cache xe2x80x9chit ratexe2x80x9d is the ratio of memory references that xe2x80x9chitxe2x80x9d in cache to total memory references in the system. It is known that the effective performance of cache-equipped processors can vary dramatically with the cache xe2x80x9chit rate.xe2x80x9d It is also known that hit rate varies with program characteristics, the size of cache, occurrence of interrupting events, and other factors. In particular, it is known that large effective cache sizes can often offer significantly better hit rates than small cache sizes.
It is therefore advantageous to have a large effective cache size.
Many computer systems embody multiple processors, each having its own cache system for caching main memory references. Typically, processors of such systems may access shared memory. Coherency is required in cache memory of such computer systems. Cache coherency means that each cache in the system xe2x80x9cseesxe2x80x9d the same memory values. Therefore, if a cache wants to change the contents of a memory location, all other caches in the system having copies of that memory location in its cache must either update or invalidate its contents.
There are many ways data may be compressed that are known in the art. These include run-length algorithms, repeat-based algorithms, and dictionary-based algorithms. Run-length algorithms are commonly used in facsimile transmission. Software tools for compressing and decompressing data for disk storage and modem data transmission are common in the industry. Software tools for compressing and decompressing disk data when that disk data is cached in main memory are known. Software utilities for compressing and decompressing main memory pages are also known. Most of these utilities and tools make use of a processor of the system to perform both compression and decompression operations, these utilities and tools can consume sufficient processor time to adversely affect system performance.
Many systems provide for caching of disk data in main memory, or other memory of speed similar to that of main memory. For purposes of this patent, a cache for caching disk data in main memory or memory of speed similar to that of main memory is a disk cache; and a cache for caching main memory references as information is fetched by a processor is a processor cache. An on-chip cache is a processor cache located on the same integrated circuit as the processor.
Data stored in cache memory is typically not stored in compressed form. It would be advantageous to do so to attain higher effective cache size, and thus higher hit rates.
Typically, data compression requires more time than decompression because of the time required to count run lengths, detect repeats, and build dictionaries.
U.S. Pat. No. 5,826,054 discloses a processor cache storing a stream of compressed instructions. U.S. Pat. No. 6,216,213 also discloses a processor cache storing a stream of compressed instructions. FIG. 8 of the ""054 patent discusses compressing an instruction stream at the time a loadable module is created; it is apparent that a software compression utility executing on a processor is contemplated. The compressed instruction streams of the ""054 and ""213 patents seem to be decompressed xe2x80x9con the flyxe2x80x9d by dedicated hardware as instructions are read from cache into the processor.
It would be advantageous for a processor cache to store at least some data and instructions in compressed form, thereby providing larger effective cache size than available without compression. It would also be advantageous to perform compression and decompression transparently to remaining system components, and to write the cache without delays associated with compressing information.
A processor cache subsystem for storing instructions and data is capable of storing instruction and data information in both compressed and uncompressed form. The cache subsystem is written first with uncompressed information, a compression engine is provided to scan the cache and replace uncompressed information with compressed information, while releasing any freed space for reuse.
In a particular embodiment, the cache subsystem forms the second level of cache in a system, however it is anticipated that the invention is applicable to third or even fourth level cache subsystems.
Processor references are passed to the cache subsystem upon a miss in first level cache. A portion of each reference address is used to address a tag memory. The tag memory contains at least one, and preferably more, tag address fields, flags for cache management, and data line pointers for each way indicating locations of each cache line of each superline in a cache data memory. In a particular embodiment, the cache is organized as lines of sixty-four bytes in superlines of four lines. The cache data memory is organized as an array of sublines, where two or more sublines form each cache line; in a particular embodiment the cache has sublines of thirty-two bytes. xe2x80x9ccompressedxe2x80x9d flags are associated with each subline in the cache data memory.
Cache hits are determined by comparing tag address fields of the tag memory with additional bits of the reference address. Upon a cache hit, data line pointers from the tag memory are used to locate information in the cache data memory. Compressed information is decompressed prior to transmittal to the first level cache and processor.
In the interest of speed, information retrieved from higher level cache or main memory upon cache misses is written to the cache data memory in uncompressed form, with tag information and pointers updated appropriately. This information is written at a location indicated on an empty-space list, and remains uncompressed until located and compressed by the compression engine.
Cached information is evicted when tags are recycled, or when the empty space list drops below a predetermined threshold.
In another embodiment, cache is addressed through a cache tag memory having sufficient address tags for sixteen-way associativity. Each address tag is associated with a way indicator and flags. Among the flags for each address tag is a compressed flag and a width indicator.
The associated cache data memory has sufficient memory to store twelve ways of uncompressed information. At a location in cache data memory corresponding to each tag memory line is stored a cache line group. The address tags provided in excess of the number required to point to uncompressed data in each cache line group are herein referred to as excess address tags.
The way indicator associated with each address tag indicates where in the cache line group there may be stored a cache line associated with the address tag.
The compression engine periodically reads the cache line group, rewrites it in compressed form if data in the cache line group is compressible, and updates the compressed flags and way indicators of each associated tag line to indicate where in the cache line group each line of data is stored.
In this embodiment, if some or all lines of the cache line group are stored in compressed form, the remaining space in the line group is usable by cache data associated with the excess address tags.
The present invention is expected to be applicable to other associativities and ways of storage. For example, a cache having thirty-two sets of address tags in each line of cache tag memory, and twenty-four ways of storage could be implemented in similar manner.