The present invention relates to computer systems, and more particularly, but not by way of limitation, to methods and apparatus for improving computer system performance with a shared cache memory.
A cache memory is a high-speed memory unit interposed in the memory hierarchy of a computer system between the relatively slower main memory and the relatively faster processors to improve effective memory transfer rates thereby improving system performance. The name refers to the fact that the small cache memory unit is essentially hidden and appears transparent to the user, who is aware only of the larger main memory. The cache memory is usually implemented by semiconductor memory devices having speeds that are comparable to the speed of the processor, while the main memory utilizes a less costly, lower-speed technology. The cache memory concept anticipates the likely reuse by a processor of selected data in main memory by storing a copy of the selected data in the cache memory where it may be accessed by a processor request for it significantly quicker.
A cache memory typically includes a plurality of memory sections, wherein each memory section stores a block or a xe2x80x9clinexe2x80x9d of two or more words of data. For systems based on the particularly popular model 80486 microprocessor, a line consists of four xe2x80x9cdoublewordsxe2x80x9d (wherein each doubleword comprises four 8-bit bytes). Each line has associated with it an address tag that uniquely identifies which line of main memory it is a copy of.
In many computer systems, there may be several levels of cache memory. For example, each processor of a computer system may have one or more internal cache memories dedicated to that processor (these cache memories may be referred to as local cache memories). These dedicated cache memories may operate in a heirarchical fashionxe2x80x94i.e., first, the lowest level of cache memory is interrogated to determine whether it has the requested line of main memory and, if it is not there, the second lowest level of cache memory is then interrogated, and so forth. One or more processors, in turn, may share level of cache memory, and it is conceivable that one or more shared cache memories may themselves share another level of cache memory. At the highest level of memory is the main memory, which is inclusive of all of the layers of cache memory. (Note that main memory may also be referred to as system memory).
By way of illustration, consider the operation of a simple system having one processor, one level of cache memory, and main memory. When a read request originates in the processor for a line of data, an address tag comparison is made to determine whether a copy of the requested word resides in a line of the cache memory. If present, the data is used directly from the cache memory. This event is referred to as a cache read xe2x80x9chit.xe2x80x9d If the data is not present, a line containing the requested word is retrieved from main memory and stored in the cache memory. The requested word is simultaneously provided to the processor. This event is referred to as a cache read xe2x80x9cmiss.xe2x80x9d
In addition to using a cache memory to retrieve data, the processor may also write data directly to the cache memory instead of to the main memory. When the processor desires to write data to memory, an address tag comparison is made to determine whether the line into which data is to be written resides in the cache memory. If the line is present in the cache memory, the data is written directly into the line. This event is referred to as a cache write xe2x80x9chit.xe2x80x9d In many systems a data xe2x80x9cdirty bitxe2x80x9d for the line is then set. The dirty bit indicates that data stored within the cache memory line is dirty or modified and is, therefore, the most up-to-date copy of the data. Thus, before the line is deleted from the cache memory or overwritten, the modified data must be written into main memory. This latter principle may be referred to as preserving cache coherency.
If the line into which data is to be written does not exist in the cache memory, the line is either fetched into the cache memory from main memory to allow the data to be written into the cache memory, or the data is written directly into the main memory. This event is referred to as a cache write xe2x80x9cmiss.xe2x80x9d
In some cases, a cache memory may need to xe2x80x9ccastoutxe2x80x9d a line of data because of the limited amount of storage space inherent in cache memories. This castout data may be dirty or modified in which case it should not be discarded by the computer system. Thus, castout data is normally provided to the next higher level of cache memory (which may actually be the main memory) usually during a special set of bus cycles. This too preserves cache coherency.
Cache memories may operate under a variety of protocols, including the popular MESI (Modified, Exclusive, Shared, Invalid) protocol where data in a particular cache may be marked as dirty or modified, exclusive to the particular cache memory and main memory, shared between two or more cache memories, or an invalid line in the cache memory (which will result in a cache miss). More information regarding caching principles and techniques, including the MESI protocol, may be found in the various versions and volumes of the Intel P6 Family of Processors, Hardware Developer""s Manual all of which are hereby incorporated by reference.
Turning now to FIG. 1, there is shown a computer system 10 operating according to these conventional caching principles and techniques. In computer system 10, processors 20A-D each have a dedicated cache memories 30A-D, respectively. Additionally, processors 20A-B are operably connected to and share a shared cache memory 50A through bus 40A, while processors 20C-D are operably connected to and share a shared cache memory 50B through bus 40B. Processors 20A-B are symmetric agents on bus 40A and shared cache memory 50A is a priority agent on bus 40A. Processors 20C-D and shared cache memory 50B operate in a similar fashion on bus 40B. The shared cache memories 50A and 50B, in turn, act as symmetric agents on bus 60 and a memory subsystem 70 (comprising a memory controller 80 and main memory 90) acts as a priority agent.
In operation, processor 20A may, for example, issue a read or write request for a line of data located in main memory. Processor 20A will first determine whether its dedicated cache memory 30A contains the requested line. If so, the line is provided to the processor 20A from its dedicated cache memory 30A. If, however, the line of data requested is not present in dedicated cache memory 30A, a xe2x80x9csnoopxe2x80x9d phase is initiated on bus 40A to determine if the requested line is located in dedicated cache memory 30B (belonging to processor 20B) or in shared cache memory 50A. During a snoop phase, other cache memories on bus 40A, may issue signals if they have a copy of the requested line (e.g., by raising a HIT# signal) and what the condition of the line is (e.g., whether the line is dirty or modified, exclusive to that cache memory and main memory, or shared by that cache memory and one or more cache memories). If the line is located in a cache memory located on the bus 40A, the line will be provided to dedicated cache 30A where it may be cached. However, if the requested line is not located in any of the cache memories located on bus 40B (including the shared cache memory 50A), the shared cache memory 50A must then initiate the read or write transaction on bus 60 (in effect xe2x80x9cre-initiatingxe2x80x9d the original transaction) to access the requested line from main memory 90. (In some cases of course, the shared cache memory 50A will need to initiate a snoop phase on bus 60 to determine whether the requested line is in shared cache memory 50B or some other cache memory on bus 60). Main memory 90 will then respond to the line request and place the requested line of data on bus 60. After several bus cycles, the requested line of data eventually makes its way to the dedicated cache memory 30A of the requesting processor 20A, where it is cached according to system protocol.
At least one drawback of the system 10 of FIG. 1 is the delays associated with accessing a requested line from main memory 90 in the manner described above. In addition to any delays inherent in the internal design of shared cache memory 50A, the transaction initiated on bus 60 requires another bus arbitration cycle, etc. Similarly, the return path from main memory 90 to the cache memory 30A of the requesting processor 20A requires more bus arbitrations and other delays. Thus, there exists a need in the art for cache memory systems having a reduced delay in servicing dedicated cache memory misses. There also exists an need in the art for cache memory systems having lower memory request latencies and that cause relatively less bus traffic.
A computer system comprising a plurality of processors each having dedicated cache memories, another level of cache memory shared by the plurality of cache memories, and a main memory. The processors and the shared cache memory act as peers on a bus located between the processors and main memory. All data placed upon the bus by the main memory as a result of a read transaction are written into the shared cache memory. The shared cache memory does not initiate any transactions.