Multiprocessor computers by definition contain multiple processors that can execute multiple parts of a computer program or multiple programs simultaneously. In general, this parallel computing executes computer programs faster than conventional single processor computers, such as personal computers (PCs), that execute the parts of a program sequentially. The actual performance advantage is a function of a number of factors, including the degree to which parts of a program can be executed in parallel and the architecture of the particular multiprocessor computer at hand.
Multiprocessor computers may be classified by how they share information among the processors. Shared-memory multiprocessor computers offer a common memory address space that all processors can access. Processes within a program communicate through shared variables in memory that allow them to read or write to the same memory location in the computer. Message passing multiprocessor computers, on the other hand, have a separate memory space for each processor. Processes communicate through messages to each other.
Multiprocessor computers may also be classified by how the memory is physically organized. In distributed memory computers, the memory is divided into modules physically placed near each processor. This placement provides each processor with faster access time to its local memory. By contrast, in centralized memory computers, the memory is physically in just one location, generally equidistant in time and space from each of the processors. Both forms of memory organization use high-speed cache memory in conjunction with main memory to reduce execution time.
Multiprocessor computers with distributed shared memory are often organized into nodes with one or more processors per node. A node also includes local memory for the processors, a remote cache for caching data obtained from memory on other nodes, and a remote cache interconnect. The remote cache interconnect interfaces with other nodes on the computer through a network by using a cache coherency protocol, such as the protocol described in the Scalable Coherent Interface (SCI)(IEEE 1596).
A processor on a node communicates directly with the local memory and communicates indirectly with memory on other nodes by using the remote cache interconnect. For example, if the desired data is in local memory, a processor can obtain the data by accessing the local memory directly over a node bus. If, however, the desired data is located in a memory on another node, the processor has no direct access to the other nodes. Instead, the processor must make a request to the remote cache interconnect. The remote cache interconnect then obtains the requested data from another node on the network and delivers the data to the requesting processor.
The processor communicates with the remote cache interconnect through either a deferred request or a retry request. In a deferred request, the remote cache interconnect passes the data to the processor over the node bus as soon as the remote cache interconnect receives the data from the network. In a retry request, the remote cache interconnect holds the data received from the network and waits for the processor to again request the data (a retry). When the processor retries its request, the remote cache interconnect passes the data over the node bus to the processor.
A problem occurs with distributed memory systems when multiple processors try to simultaneously obtain the same data using retry-type requests. In this circumstance, there is no guarantee that each processor will make forward progress: a processor may request data but never receive it. When several processors in different nodes try to obtain control of the same data, an endless cycle of stealing data from each processor can occur.
The problem is best understood by example. Assume a first processor on a first node requests ownership of a data line (i.e., a block of memory) located on a remote node. The node's remote cache interconnect requests the data line from the network. When the data line is received, the remote cache interconnect waits for the first processor to retry its request for the data line. During this interim period, a second processor on a second node requests ownership of the same data line. Using current cache protocols, the first node responds by passing control of the data line to the second node before the first processor retries its request for the data line. A retry request made by the first processor is then rejected. The second node, in essence, has stolen the data line from the first node before the first processor received control of the data line. The remote cache interconnect on the second node now waits for the second processor to issue its retry request. During this interim period, the first processor again requests ownership of the same data line. By doing so, the first processor steals the data line back from the second node before the second processor receives control of the data line. This cycle can continue indefinitely and no forward progress by either processor is made.
An objective of the invention, therefore, is to provide a multiprocessor computer system that guarantees forward progress of processor requests. A further objective of the invention is to provide such a system that conforms with existing cache coherent protocols, such as the SCI protocol.