1. Field of the Invention
The present invention relates to the field of computer systems. More specifically, the present invention relates to a memory cache within a computer system.
2. Art Background
A cache is a special memory subsystem in which frequently used data values from main memory are duplicated for quick access. Although main memory is typically implemented using dynamic random access memory (DRAM), a cache is typically implemented using static random access memory (SRAM). Because SRAM can be accessed faster than the less expensive DRAM, a cache can be accessed faster than can main memory. Furthermore, a data cache typically has a dedicated high speed bus (wires) coupling the cache to the processor. Main memory, on the other hand, typically is coupled to the processor by a slower bus that frequently must be shared with other devices.
A memory cache stores the contents of frequently accessed random access memory (RAM) locations and the addresses where these data items are stored. When a processor references an address in main memory, a check is made to determine whether or not the cache holds a copy of the contents stored at the desired address. If cache does hold a valid copy of the contents stored at the desired address, the data is quickly returned to the processor from the cache. On the other hand, if the cache does not hold a valid copy of the contents stored at the desired address, a regular main memory access occurs. A cache is useful when RAM accesses to main memory are slow compared with the microprocessor speed. This is because an access to cache memory is typically faster than an access to main memory.
Frequently multiple levels of caching are provided. Thus, a processor will typically have a small primary cache located on the same integrated circuit chip as the processor. A secondary cache is then provided separate from the integrated circuit chip containing the processor. The smaller a cache is, the less data it is able to hold, and therefore, the more likely it is that data which has been requested will not be available within the cache. Generally, a primary (level one) cache will be several orders of magnitude smaller than a secondary (level two) cache. The secondary cache, in turn, will be several orders of magnitude smaller than main memory. Thus, for example, a level one cache may be approximately one kilobyte (KB) in size, a level two cache may be approximately one megabyte (MB) in size and main memory may be several megabytes in size.
A cache "hit" is said to have occurred when data requested from a cache is found in the cache. On the other hand, if data requested from a cache is not found within the cache, then a cache "miss" is said to have occurred. Typically, a processor will attempt to access the primary cache first. If a primary cache miss occurs, the processor will then attempt the secondary cache. If a miss occurs in the secondary cache, then the processor will try the next level of cache. When all levels of cache have been exhausted and a cache hit has not occurred, the processor will finally request the data from main memory.
If a processor is a single scalar processor, it executes only one instruction at a time. In one categorization scheme, the instructions executed by the processor can be categorized as being load instructions, store instructions or general instructions. Data from main memory is stored in a register of the register file by the execution of a load instruction. Data residing in a register of file register file is stored in main memory by executing a store instruction. A typical general instruction will cause data stored in one or more registers of the register file to be used to produce a result. The result is then returned to a register of the register file where it is stored.
Thus, for example, it might take the execution of four instructions to add a first number, stored at a first address of main memory, to a second number, stored at a second address of main memory, and store the obtained sum at a third address of main memory. A first load instruction would cause the data stored at the first address of main memory to be stored in the first register of the register file. Then, a second load instruction would cause the data stored at the second address of main memory to be stored in the second register of the register file. Next, a general type add instruction would be executed. The add instruction would take the numbers stored in the first and second registers, calculate a resulting sum, and then store the resulting sum in a third register of the register file. Finally, a fourth instruction would be executed. The fourth instruction would be a store instruction that would cause the sum stored in the third register of the register file to be stored at the third address of main memory.
Before the example add instruction can be executed, the proper data must be loaded into the first and second registers. If a valid copy of the data to be loaded was not stored in the cache then a cache miss would occur. When a single scalar processor has a cache miss, execution stalls while higher levels of cache are checked, and if necessary, the data required to execute the instruction is retrieved from main memory. A cache using this scheme is called a blocking cache. This is because, if there is a cache miss, execution of instructions is blocked until the requested data causing the miss has been supplied. Because the processor stays idle while waiting for the requested data to be provided, a high miss rate can cause serious degradation in the performance of the processor.
In contrast to a single scalar processor, a superscalar processor is capable of executing more than one instruction at a time. Out-of-order processors also exist that, while they may only be able to execute one instruction at a time, are able to alter the order of instructions which they are executing. In both out-of-order and superscalar processors, a performance benefit can be achieved by implementing the caches in a non-blocking manner.
In a non-blocking cache, if an instruction causes a cache miss, execution of the stalling will be deferred while the required data that caused the cache miss is retrieved. The processor, however, is able to continue executing instructions of the instruction stream that are subsequent to a stalled instruction. Thus, a subsequent instruction can be executed if the subsequent instruction does not depend upon the data being retrieved for the stalled instruction and also does not depend upon a result derived from the data being retrieved for the stalled instruction. Of course, a subsequent instruction that also causes a cache miss will have its execution deferred while the data retrieval of the data causing the second cache miss is pending. Thus, the processor does not stall when a cache hit occurs. Instead, the processor continues to execute instructions for which data is available until the requested missing data is returned. Then, the processor executes the instruction that was delayed while the retrieval of the data was pending. In a superscalar or out-of-order processor a significant performance increase can result through the addition of a non-blocking cache.
In a typical non-blocking cache, extra registers sometimes referred to as miss information status holding registers (MSHRs) are added to a cache to keep track of outstanding data requests caused by cache misses. The MSHRs are typically implemented using content addressable memory (CAM) and can handle on the order of four outstanding misses for the cache. Special conflict resolution logic is also added to the cache to ensure proper operation when the processor issues a request for a data address that corresponds to an outstanding cache miss.
The addition of the MSHRs and conflict resolution logic to a small cache can greatly increase the amount of die area required for the cache. In some cases, simply increasing the size of a blocking cache by the amount of die area that would be added to provide a non-blocking feature will result in the same performance increase as would be realized by adding the non-blocking feature. This is because increasing the size of the cache will increase the hit rate for the cache because, with more data stored in the cache, the probability that the desired data will be in the cache is increased. In turn, a higher hit rate will decrease the number of times that the processor will stall because of a miss.