1. Field of the Invention
This invention relates to the field of multiprocessor computer systems and, more particularly, to starvation avoidance in computer systems.
2. Description of the Related Art
Multiprocessing computer systems include two or more processors that may be employed to perform computing tasks. A particular computing task may be performed upon one processor while other processors perform unrelated computing tasks. Alternatively, components of a particular computing task may be distributed among multiple processors to decrease the time required to perform the computing task as a whole.
A popular architecture in commercial multiprocessing computer systems is a shared memory architecture in which multiple processors share a common memory. In shared memory multiprocessing systems, a cache hierarchy is typically implemented between the processors and the shared memory. In order to maintain the shared memory model, in which a particular address stores exactly one data value at any given time, shared memory multiprocessing systems employ cache coherency. Generally speaking, an operation is coherent if the effects of the operation upon data stored at a particular memory address are reflected in each copy of the data within the cache hierarchy. For example, when data stored at a particular memory address is updated, the update may be supplied to the caches that are storing copies of the previous data. Alternatively, the copies of the previous data may be invalidated in the caches such that a subsequent access to the particular memory address causes the updated copy to be transferred from main memory.
Shared memory multiprocessing systems generally employ either a snooping cache coherency protocol or a directory-based cache coherency protocol. In a system employing a snooping protocol, coherence requests are broadcast to all processors (or cache subsystems) and memory through a totally ordered address network. Each processor “snoops” the requests from other processors and responds accordingly by updating its cache tags and/or providing the data to another processor. For example, when a subsystem having a shared copy of data observes a coherence request for exclusive access to the block, its copy is typically invalidated. Likewise, when a subsystem that currently owns a block of data observes a coherence request to that block, the owning subsystem typically responds by providing the data to the requestor and invalidating its copy, if necessary. By delivering coherence requests in a total order, correct coherence protocol behavior is maintained since all processors and memories observe requests in the same order.
In a standard snooping protocol, requests arrive at all devices in the same order, and the access rights of the processors are modified in the order in which requests are received. Data transfers occur between caches and memories using a data network, which may be a point-to-point switched network separate from the address network, a broadcast network separate from the address network, or a logical broadcast network which shares the same hardware with the address network. Typically, changes in ownership of a given cache block occur concurrently with changes in access rights to the block.
A potential problem associated with shared memory multiprocessing systems is referred to as starvation. For example, in a computer system employing a standard snooping protocol, a processor P1 which is attempting to perform a load to a cache block which it does not have must issue a request for shared access to the cache line. This processor P1 will gain the right to read from this block when it receives its own request, and it can lose that right as soon as it receives a later request from a different processor P2 for exclusive access to the same block. In particular, P1 may lose its access rights due to a request from P2 which it receives before it receives the data. As a result, P1 would not be able to perform the load and would have to issue another request for shared access to the block. The access rights granted by this request could also be lost (e.g., due to a request from yet another processor P3) before it received the data. In this manner, the processor could fail to perform the load an unbounded number of times and thus could starve.
The starvation problem is typically solved in systems that utilize standard snooping protocols by placing incoming address packets in a queue. The packets are removed from the queue in the order they are placed in, and they affect the processor's access rights only when they are moved from the queue. When a processor P1's own request is at the front of its queue, but it has not yet received the data for this request, it waits until the data is received before removing its request from the queue. As a result, starvation on loads is avoided because the data must be received before a later, invalidating process from another processor is received.
Unfortunately, the standard snooping protocol suffers from a significant performance drawback. In particular, the requirement that access rights of processors change in the order in which snoops are received may limit performance. For example, a processor may have issued requests for cache blocks A and B, in that order, and it may receive the data for cache block B (or already have it) before receiving the data for cache block A. In this case the processor must typically wait until it receives the data for cache block A before using the data for cache block B, thus increasing latency. The impact associated with this requirement is particularly high in processors that support out-of-order execution, prefetching, multiple core per-processor, and/or multi-threading, since such processors are likely to be able to use data in the order it is received, even if it differs from the order in which it was requested.
In systems that implement a directory-based protocol rather than a snooping protocol, both the address network and the data network are typically point-to-point, switched networks. When a processor requests a cache block, the request is sent to a directory which maintains information regarding the processors that have copies of the cache block and their access rights. The directory then forwards the request to those processors which must change their access rights and/or provide data for the request (or if needed, the directory will access the copy of the cache block in memory and provide the data to the requestor). Since there is no way of knowing when the request arrives at each processor to which it is sent, all processors that receive the request must typically acknowledge reception by providing data or sending an acknowledge (ACK) message to either the requestor or the directory, depending on the protocol.
Typical systems that implement a directory-based protocol may be associated with various drawbacks. For example, systems that employ directory-based protocols may suffer from starvation in a manner similar to that discussed previously. In addition, such systems may suffer from high latency due to the requirement that requests go first to a directory and then to the relevant processors, and/or from the need to wait for acknowledgment messages. Still further, when a large number of processors must receive the request (such as when a cache block transitions from a widely shared state to an exclusive state), all of the processors must typically send ACKs to the same destination, thus causing congestion in the network near the destination of the ACKs and requiring complex logic to handle reception of the ACKs. Finally, the directory itself may add cost and complexity to the system.
It may accordingly be desirable to provide an efficient cache consistency protocol that does not require that access rights or processors change in the order in which requests are received while avoiding starvation.