1. Field of the Invention
The present invention generally relates to methods and apparatus for controlling access to a level two cache memory by multiple users and more particularly relates to queuing of multiple cache request misses.
2. Description of the Prior Art
It is known in the prior, art to develop computer systems having cache memory(s) built into the basic architecture. The two fundamental characteristics of any memory unit are capacity (i.e., number of storage cells) and speed. The cost of a memory unit is, of course, increased with increased capacity and/or increased speed. Because of the time delays necessitated by increased size, memory systems which are both very large in capacity and very fast tend to be cost prohibitive.
Therefore, for virtually all general purpose computers, cost requirements dictate that the main storage subsystem will operate more slowly than the processor(s) which it serves. Therefore, there tends to be a constant mismatch between the rate at which data is to be accessed from the main storage subsystem and the rate at which that data is processed. Thus, a constant performance issue with computer design is related to reduction of the latencies associated with the wait between a processor request for memory access and the time when that request is actually honored by the main storage subsystem.
A common technique for matching a relatively high speed processor to a relatively low speed main storage subsystem is to interpose a cache memory in the interface. The cache memory is much faster but of much smaller capacity than the main storage subsystem. Data requested by the processor is stored temporarily in the cache memory. To the extent that the same data remains within the cache memory to be utilized more than once by the processor, substantial access time is saved by supplying the data from the cache memory rather than from the main storage subsystem. Further savings are realized by loading the cache memory with blocks of data located near the requested data under the assumption that other data will be soon needed from the loaded block.
There are additional issues to be considered with regard to cache memory design. Program instruction data, for example, tends to be quite sequential and involves only read accesses. However, operand data may involve both read an write accesses. Therefore, it is helpful to optimize cache memory design by dividing instruction processor cache memories into program instruction and operand portions.
Furthermore, if a computer system contains multiple processing units, provision must be made to ensure that data locations accessed by a first processing unit are provided as potentially modified by write operations from a second processor unit. This data coherency problem is usually solved via the use of store-through (i.e., write operands cause immediate transfer to main storage) or store-in (i.e., cache memory contains only updated data and flags are needed to show that main storage location contains obsolete data).
As the use of cache memory has become more common, it is now known to utilize multiple levels of cache memory within a single system. U.S. Pat. No. 5,603,005, issued to Bauman et al. on Feb. 11, 1997, incorporated herein by reference, contains a description of a system with three levels of cache memory. In the multiprocessor Bauman et al. system, each instruction processor has dedicated instruction (i.e., read-only) and operand (i.e., write-through) cache memories. This corresponds to level one cache memory.
A level two cache memory is located within each system controller. The level two cache memory of Bauman et al. is a store-in cache memory which is shared by all of the processors coupled to corresponding system controller. The system of Bauman et al. contains a level three cache which is coupled between each of the system controllers and a corresponding main memory unit.
As can be readily appreciated, if all of the processors coupled to a single system controller experience cache misses in their respective first level cache memories, each will make a near simultaneous request of the second level cache memory within the system controller. If all (or near all) of the near simultaneous requests to the second level cache memory are also misses, it is necessary to sequence the order in which these requests will be serviced from the third level cache memory and/or the main storage subsystem. The total latency is particularly long if misses are also experienced at the third level cache memory.
It has been common in the past to treat this condition using a single queue for all instruction processor requests. The queue is generally implemented as a simple FIFO. The simple FIFO has. some times been modified to place read request behind write requests in the queue to overcome potential data latency problems.
The present invention overcomes the disadvantages associated with the prior art by providing a way to queue two requests from the same IP in a more efficient manner when both requests experience SLC (second level cache) misses in the same time frame. In accordance with the present invention, the single FIFO (first-in-first-out) queue is replaced with two separate queues (a read queue and a write queue).
Each instruction processor can have one outstanding read request and one outstanding write request during the same time frame. If both requests to the SLC are misses, the second request can be sent over the same bus lines to the memory without waiting for the initial request miss to be completed. This improves the latency time for the second request back to the instruction processor. This will improve the performance of the instruction processor.
So in a fully populated system there are 16 instruction processors, each with its own SLC. With the prior method, up to 16 request misses could be held up waiting for the previous 16 miss requests to complete. With the current method, all 32 requests (16 read requests and 16 write request) would have been sent out through their respective busses to memory. Thus, the latency for the 16 previously held requests is reduced, and over all system performance is improved.