1. Field of the Invention
The present invention relates to the field of cache memory structures for multiprocessor computer systems. More particularly, the present invention relates to a pending write-back cache controller in a cache control system for a multiprocessor computer system using a packet switched bus.
2. Art Background
In a typical computer system, the processing unit operates at a substantially faster speed than the main memory. When the processing unit executes instructions faster than memory can supply them, the processing unit must remain idle while it waits for the memory to retrieve the next instruction. Processing unit idle time adversely affects system performance. To avoid unnecessary processing unit idle time while awaiting data or instructions from the main memory, a cache memory capable of operating at a higher speed than a main memory is often used to buffer the data and the instructions between the main memory and the processing unit. The cache memory is typically much smaller than the main memory.
The data and instructions from the main memory are mapped into the cache memory in uniform units referred to as cache lines. Each cache line represents an aligned continuous segment of main memory. Since the cache memory is usually much smaller than the main memory, it can store only a limited subset of the main memory. Therefore the cache memory needs to store a portion of the data's main memory address. This portion of the address is called the address tag, and there is one address tag per cache line. Each cache line may be further subdivided into smaller uniform increments referred to as subblocks. Access to a cache line in the cache memory is typically made using a cache directory which stores the address tags and a set of status bits associated with the cache line.
Recently, computer systems having multiple processors have become common, directed to increasing processing speed. In a multiprocessor system, each of the processor subsystems may have its own individual cache memory. In order for a multiprocessor system with individual cache memories to operate properly, the system must maintain proper correspondence of the data stored in the cache memories since each processor may alter the data stored in its local cache memory. Correspondence of the data in the various caches is termed "cache consistency". A cache system is deemed "consistent" when the value returned from a "load from memory" operation is always the same value of the latest "store to memory" operation to the same memory address.
To maintain cache consistency, several status bits are usually maintained in the cache directory which reflects the current state of the information in each cache line. Common status bits maintained include a "valid" bit, a "shared" bit, and an "owned" bit. A "valid" bit reflects whether the information stored in the cache line is currently valid. A "shared" status bit indicates whether the information in the cache line is shared by other cache memories. If a cache line is "shared" it cannot be modified without first invalidating the cache line in the other cache memories or updating the cache line in the other cache memories. An "owned" status bit indicates that the information in the cache line has been modified without being written back to the main memory. A line of memory can be "owned" by only one processor subsystem at a time. If a processor needs to modify the contents of one of its cache lines, the processor must first change the status of cache line to make it "owned". Owned cache lines must be written back to main memory before they are replaced with new information.
An example of a multiprocessor system maintaining cache consistency is illustrated in FIGS. 1a through 1d. In FIG. 1a, the main memory unit has an address A that contains a value of 1. Processors 2 and 3 perform load A operations to obtain the value of A. During each processors load operation, the value of A is stored in the processor's local cache memory. Processors 2 and 3 now "share" memory location A and both caches have "valid" data. In FIG. 1b, Processor 1 has written a value of 2 to location A. This is permitted since neither processor 2 or processor 3 "owned" memory location A. In order to change the contents of memory location A, Processor 1 broadcasts a message across the memory bus informing other memory devices that the contents of memory location A has changed. This message causes the cache memories of processor 2 and 3 to change the status of memory location A to "invalid". The main memory unit does not maintain a set of status bits for each memory line. Instead, the main memory monitors a control line on the memory bus that is asserted whenever a request is made for a memory line that is "owned" by a processor subsystem. When the "owned" control line is asserted, the main memory learns that the line is owned by some processor subsystem and therefore does not respond to the request. Cache memory 1 now "owns" location A since it modified the contents of memory location A without updating the main memory. In FIG. 1c, processor 1 has changed the contents of memory location A to the value of 3. Since processor 1 does not share memory location A with any other processor, Processor 1 does not need to send a message across the memory bus. However, in FIG. 1d, processor 3 requires the value of memory location A for a load operation. Processor 3 must therefore send a request across the bus requesting the value of memory location A. Since processor 1 "owns" memory location A, it must respond to the request with a reply containing contents of memory location A. Memory location A is now represented in the cache memories of processors 1 and 3. Although memory location A is still "owned" by processor 1, it must now "share" memory location A with processor 3. Therefore, any further changes to memory location A by processor 1 must be forwarded to processor 3. Processor 1 must eventually write-back the changed contents of memory location A to main memory.
In computer systems implementing a cache memory system, the cache memory is first searched when a processor requests data from a memory address. A cache controller examines the address tags in the cache directory for the requested memory address. If an address tag in the cache directory matches the memory address needed and the cache line is valid, there is a cache "hit" and the data is transferred from the cache memory to the processor. If the processor subsequently modifies the data stored in a cache line, the cache line becomes a "owned" cache line. As illustrated above, the modified or "owned" cache line must eventually be written back to the main memory. If the cache controller always updates the main memory immediately after a cache line is modified, the system is referred to as a "write-through" cache. It is called a "write-through" cache since the cache system always writes through the cache memory and into the main memory.
On the other hand, when a processor makes a read request for data from a memory address and none of the address tags in the cache directory match the requested memory address or an address match occurs but the cache line is invalid, a cache "miss" occurs. The cache controller must therefore retrieve the data from the main memory or from another processor's cache memory which owns the data. During the retrieval of the memory line, the processing unit usually must remain idle until the retrieval is completed.
When a cache controller retrieves a line of data from the main memory or from another processor's cache memory for the local processor, the line is placed into the local cache memory. If no empty cache line is available, the cache must replace one of the currently used cache lines. The cache line chosen to be replaced is typically referred to as the displaced or victim line. If the cache system is a "write-through" cache system replaces the victim line immediately. The victim line in a "write-through" cache system can be immediately replaced since the main memory already has the contents as the victim line. However, if the processor modified the contents of the cache line (an "owned" cache line), the cache controller must first write-back the contents of the cache line to main memory before the cache line can be replaced. Cache systems which only write-back the contents of an owned cache line when the cache line needs to be replaced are referred to as "write-back" caches. "Write-back" cache systems update main memory less frequently than "write-through" systems since consecutive writes by the processor to the same owned cache line will not result in multiple writes to main memory. Since "write-back" cache systems update the main memory less frequently, they are more efficient than "write-through" cache systems.
FIG. 2 illustrates a prior art multiprocessor system with individual write-back cache memories for each processor subsystem. The multiprocessor system of FIG. 2 maintains cache consistency by using a set of cache directories 28 located in each cache controller 29. The cache directories 28 contains the tag addresses for each cache line and the status bits which specify if a cache line is valid (contains valid data), owned (modified and not written back to main memory yet), and/or shared (represented in another processor's cache memory).
When a processor in the multiprocessor system of FIG. 2 needs to read information not currently stored in the local cache memory, it must often replace a currently used cache line. If the cache line to be replaced is "owned", the contents of the cache line must be written back to main memory 23. In a typical write-back cache memory system, the cache controller 29 first writes-back the "owned" cache line to main memory 23 and after the write-back is completed, it requests the new line of data from main memory 23. Although requesting the new cache line only after writing back the owned cache line results in a simple design, this method creates a long latency period while the owned cache line is written back and the new cache line is obtained. During this latency period, the processor 21 usually remains idle while it waits for the needed data. Consequently, this long latency period required for cache line replacement degrades the efficiency of the multiprocessor computer system. This is especially true in large cache memories 37 where cache lines tend to be long and several owned subblocks may need to be written to memory before the new desired cache line data is requested.