The invention relates generally to computer memory systems and more particularly, but not by way of limitation, to a caching technique to improve host processor memory access operations.
In a typical computer system, program instructions and data are read from and written to system memory at random addresses. To combat this random nature of memory access operations level-1 (L1) and level-2 (L2) cache memories have been used to decrease the time, or number of clock cycles, a given processor must spend communicating with system memory during memory read and write operations.
Cache memories rely on the principle of access locality to improve the efficiency of processor-to-memory operations and, therefore, overall computer system performance. In particular, when a processor accesses system memory for program instructions and/or data, the information retrieved includes not only the targeted instructions and/or data, but additional bytes of information that surround the targeted memory location. The sum of the information retrieved and stored in the cache is known as a xe2x80x9ccache line.xe2x80x9d (A typical cache line may comprise 32 bytes.) The principle of access locality predicts that the processor will very probably use the additional retrieved bytes subsequent to the use of the originally targeted program instructions. During such operations as the execution of program loops, for example, information in a single cache line may be used multiple times. Each processor initiated memory access that may be satisfied by information already in a cache (referred to as a xe2x80x9chitxe2x80x9d), eliminates the need to access system memory and, therefore, improves the operational speed of the computer system. In contrast, if a processor initiated memory access can not be satisfied by information already in a cache (referred to as a xe2x80x9cmissxe2x80x9d), the processor must access system memoryxe2x80x94causing a new cache line to be brought into the cache and, perhaps, the removal of an existing cache line.
Referring to FIG. 1, many modern computer systems 100 utilize processor units 102 that incorporate small L1 cache memory 104 (e.g., 32 kilobytes, KB) while also providing larger external L2 cache memory 106 (e.g., 256 KB to 612 KB). As shown, processor unit 102, L1 cache 104 and L2 cache 106 are coupled to system memory 108 via processor bus 110 and system controller 112. As part of processor unit 102 itself, L1 cache 104 provides the fastest possible access to stored cache line information. Because of its relatively small size however, cache miss operations may occur frequently. When a L1 cache miss occurs, L2 cache 106 is searched for the targeted program data and/or program instructions (hereinafter collectively referred to as data). If L2 cache 106 contains the targeted data, the appropriate cache line is transferred to L1 cache 104. If L2 cache 106 does not contain the targeted data, an access operation to system memory 108 (typically mediated by system controller 112) is initiated. The time between processor unit 102 initiating a search for target data and the time that data is acquired or received by the processor unit (from L1 cache 104, L2 cache 106 or memory 108) is known as read latency. A key function of caches 104 and 106 is to reduce the processor unit 102""s read latency.
If L1 cache 104 is full when a new cache line is brought in for storage, a selected cache line is removed (often referred to as flushed). If the selected cache line has not been modified since being loaded into L1 cache 104 (i.e., the selected cache line is xe2x80x9ccleanxe2x80x9d), it may be replaced immediately by the new cache line. If the selected cache line has been modified since being placed into L1 cache 104 (i.e., the selected cache line is xe2x80x9cdirtyxe2x80x9d), it may be flushed to L2 cache 106. If L2 cache 106 is full when a L1 cache line is brought in for storage, one of its cache lines is selected for replacement. As with L1 cache 104, if the selected cache line is clean it may be replaced immediately. If the selected cache line is dirty, however, it may be flushed to posted write buffer 114 in system controller 112. The purpose of posted write buffer 114 is to provide short-term storage of dirty cache lines that are in the process of being written to system memory 108. (Posted write buffers 114 are typically only large enough to store a few, e.g., 8, cache lines.)
While reasonably large by historical standards, the size of both L1 cache 104 and L2 cache 106 are small relative to the amounts of data accessed by modern software applications. Because of this, computer systems employing conventional L1 and L2 caches (especially those designed for multitasking operations) may exhibit unacceptably high cache miss rates. One effect of high cache miss rates is to increase the latency time of processor unit read operations. Thus, it would be beneficial to provide a mechanism to reduce the memory latency time experienced by host processor units.
In one embodiment the invention provides a computer system comprising a processor, a level-1 cache (operatively coupled to the processor), a level-2 cache (operatively coupled to the processor), a system memory, and a system controller (operatively coupled to the processor, level-1 cache, level-2 cache and system memory), wherein the system controller has a memory buffer adapted to store cache lines flushed (cast out) from one or more processor caches. The memory buffer, referred to herein as a cast-out cache, may be configured as a set associative or fully associative memory and may comprise dynamic or static random access memory integrated into the system controller.
In another embodiment, the invention provides a method to control memory access transactions. The method includes receiving a memory access request signal from a device, identifying the device, selecting a cache structure based on the identified device, using the selected cache structure to satisfy the memory access request. The acts of selecting a cache structure and using the selected cache structure may comprise selecting a cache structure if the identified device is a processor unit, otherwise accessing a system memory to satisfy the memory request. Methods in accordance with the invention may be stored in any media that is readable and executable by a computer system.