The present invention is directed to computer systems, and more particularly, to data caching methods and apparatus for improved performance in such systems.
A multiprocessor computer system, by definition, comprises a plurality of instruction processors. Each instruction processor typically has access to the main memory of the computer system. In many multiprocessor computer systems, each instruction processor has its own local cache memory for storing frequently accessed data from the main memory. When a given processor accesses data from an address in main memory, a copy of the retrieved data is stored locally in its cache memory. On subsequent requests for data at the same address, the data can be read out of the local cache memory, rather than having to access the main memory. Accessing data from a local cache memory is much faster than accessing data from main memory, and thus the use of cache memories in a multiprocessor computer system typically improves the performance of the system.
Some multiprocessor computer systems allow copies of data from the main memory to be stored in the local cache of a given processor in either a read-only form (i.e., the processor is not permitted to modify the data in its cache) or a writeable form (meaning the processor is permitted to modify the data in its cache). A writeable copy of data stored in a local cache is referred to herein as xe2x80x9coriginalxe2x80x9d data. A read-only copy of data stored in a local cache is referred to as a xe2x80x9ccopyxe2x80x9d. In other nomenclature, a processor that holds data in its cache in a writeable form (i.e., xe2x80x9coriginalxe2x80x9d data) is sometimes referred to as holding that data in an xe2x80x9cexclusivexe2x80x9d state (or as having xe2x80x9cexclusivexe2x80x9d access rights to that data), and a processor that holds data in its cache in a read-only form (i.e., a xe2x80x9ccopyxe2x80x9d) is sometimes referred to as holding that data in a xe2x80x9csharedxe2x80x9d state (or as having xe2x80x9csharedxe2x80x9d access rights in that data).
When a processor attempts to fetch data from an address in main memory and that data is currently stored as xe2x80x9coriginalxe2x80x9d data in the local cache of another processor, that other processor must xe2x80x9creturnxe2x80x9d the original data to main memory (or pass it directly to the requesting processor) so that it can be accessed by the requesting processor. This is commonly referred to as a xe2x80x9creturnxe2x80x9d operation. As the number of processors in a multiprocessor computer system increases, the inventors have discovered that the overhead associated with the movement of original data between processors (e.g., as a result of numerous xe2x80x9creturnxe2x80x9d operations) results in a larger than expected performance degradation. The present invention addresses this problem.
The present invention is directed to a method and apparatus for use in a computer system having a main memory, a processor that issues addresses to the main memory to retrieve data stored at those addresses, and a cache in which copies of data retrieved by the processor are temporarily stored. According to the method of the present invention, each time original data (i.e., a modifiable copy of data) is fetched from the main memory at a particular address and a copy of the data is stored in the cache, the address of the copy of the data is stored in a queue having a predetermined depth. Once the depth of the queue is reached, the storage of each new address in the queue causes a previously stored address to be output from the queue. For each address output from the queue, the cache returns the corresponding data to the main memory. Use of the queue in accordance with this method effectively places a limit on the amount of original data stored in the cache. Preferably, the queue comprises a first-in, first-out queue. Also, the depth of the queue (i.e., the number of individual address entries in the queue) preferably is programmable, providing flexibility in establishing the limit on the amount of original data in the cache.
Apparatus according to the present invention, for use in a computer system having a main memory, a processor that issues addresses to the main memory to retrieve data stored at those addresses, and a cache in which copies of data retrieved by the processor are temporarily stored, comprises a queue that stores the address of each copy of data stored in the cache. The queue has a depth whereby once the depth is reached, the addresses are successively read out of the queue. The cache then returns to the main memory, for each address read out of the queue, the corresponding data stored in the cache. As mentioned above, queue preferably comprises a first-in, first-out queue, and the depth of the queue is preferably programmable.
While the present invention can be used in single processor computer systems, the invention is particularly useful in larger multiprocessor computer systems, where the overhead associated with xe2x80x9creturnxe2x80x9d requests, which result for example when a task running on one processor requests access to original data held in the cache of another processor, can lead to significant performance degradation in the system. In such systems, the present invention can be used to limit the amount of original data held in the caches of each processor, thereby reducing the overall number of xe2x80x9creturnxe2x80x9d requests issued to the processor caches. This minimizes the previously discovered performance degradation.
Additional features and advantages of the present invention will become evident hereinafter.