Multiprocessor computers by definition contain multiple processors that can execute multiple parts of a computer program or multiple programs simultaneously. In general this parallel computing executes computer programs faster than conventional single processor computers, such as personal computers (PCs), that execute the parts of a program sequentially. The actual performance advantage is a function of a number of factors, including the degree to which parts of a program can be executed in parallel and the architecture of the particular multiprocessor computer at hand.
These computers may be classified by how they share information among the processors. Shared-memory multiprocessor computers offer a common memory address space that all processors can access. Processes within a program communicate through shared variables in memory which allow them to read or write to the same memory location in the computer. Message passing multiprocessor computers, on the other hand, have a separate memory space for each processor. Processes communicate through messages to each other.
These computers may also be classified by how the memory is physically organized. In distributed memory computers, the memory is divided into modules physically placed near each processor. This placement provides each processor with faster access time to its local memory. By contrast, in centralized memory computers, the memory is physically located in just one location, generally equally distant in time and space from each of the processors. Both forms of memory organization use high-speed cache memory in conjunction with main memory to reduce execution time.
Multiprocessor computers with distributed shared memory are often organized into nodes with one or more processors per node. Also included in the node are local memory for the processors, a remote cache for caching data obtained from memory in other nodes, and logic for linking the node with other nodes in the computer. A processor in a node communicates directly with the local memory and communicates indirectly with memory on other nodes through the remote cache. For example, if the desired data is in local memory, a processor obtains the data directly from local memory. But if the desired data is stored in memory in another node, the processor must access its remote cache to obtain the data. A cache hit occurs if the data has been obtained recently and is presently stored in the cache. Otherwise a cache miss occurs, and the cache must obtain the desired data from the local memory in another node through the linking logic.
The linking logic between nodes is not bus-based, but takes the form of a network that transmits data packets between the nodes through multiple data paths. Data coherency is maintained among the nodes through an interconnect mechanism such as the Scalable Coherent Interface (SCI) interconnection mechanism (IEEE 1596) or equivalent.
Within each node, however, the processors, I/O devices, memory, and other agents communicate with each other through one or more node buses. One function of a node bus is to distribute interrupts from one agent to another. A separate interrupt bus may be provided for this purpose (although the function could be handled with other functions in one common bus). Whatever the implementation, each of the processors typically includes an interrupt mechanism (known as an interrupt interface) for transmitting and receiving interrupts from the others. The interrupt mechanism allows one processor to interrupt a second processor to perform a task of higher priority than the task currently being performed by the second processor. Interrupt schemes such as this are well developed for bus-based multiprocessor systems, such as the scheme described in U.S. Pat. No. 5,613,128.
Bus-based interrupt schemes, however, cannot communicate interrupts across the network of a multinode multiprocessor system because the nodes are not connected by a bus. (The difference between a bus and a network is well defined. See, for example, "Interconnection Networks," Computer Architecture A Quantitative Approach, .sub.2 nd Ed. (1996).) Instead, a second interrupt mechanism must be added to handle interrupts sent via the network from a processor on one node to a processor on another node. The obvious solution is to treat an interrupt like data and provide an interrupt register with a memory address in each node. A requesting processor in one node then interrupts a processor in a second node by writing an interrupt request to the address of the interrupt register in the second node. The request is then sent by way of the network to the second node. Hardware in the second node reads the interrupt register and places the interrupt request on the second node's bus for the second processor to read.
The major drawback of this approach is the complexity that results because two interrupt mechanisms are required. In addition to the extra hardware needed to maintain two mechanisms, the processors must be programmed to distinguish between interrupt destinations. If the requesting processor is generating an interrupt for a processor or for an I/0 device on the local node, the processor must know to utilize the interrupt controller and associated interrupt bus. But if the requesting processor is generating an interrupt for a processor on a remote node, hen the processor must recognize this as a different type of interrupt and write to the address of the remote node's interrupt register. A second drawback is the extra steps required for an I/O device on a local node to interrupt a processor on a remote node. The I/O device cannot write directly to the remote node's interrupt register, so it must first send the interrupt to a local processor. The local processor must then write the interrupt request to the desired interrupt register by way of the network and network protocol.
An objective of the invention, therefore, is to provide a single interrupt mechanism to communicate interrupts within and between nodes of a multinode multiprocessor system. This mechanism virtually extends the interrupt bus in each node to the other nodes in a manner that is transparent to the processor and I/O devices in the system. System performance is thereby increased without having to add to or modify existing bus-based interrupt mechanisms.