This invention relates generally to multiprocessor computer systems that are comprised of a number of separate but interconnected processor nodes. More particularly, this invention relates to a method for efficiently routing I/O data between a processor and an I/O device within the system where multiple paths between the processor and I/O device exist.
Multiprocessor computers by definition contain multiple processors that can execute multiple parts of a computer program or multiple distinct programs simultaneously, in a manner known as parallel computing. In general, multiprocessor computers execute multithreaded-programs or single-threaded programs faster than conventional single processor computers, such as personal computers (PCs), that must execute programs sequentially. The actual performance advantage is a function of a number of factors, including the degree to which parts of a multithreaded-program and/or multiple distinct programs can be executed in parallel and the architecture of the particular multiprocessor computer at hand.
Multiprocessor computers may be classified by how they share information among the processors. Shared memory multiprocessor computers offer a common physical memory address space that all processors can access. Multiple processes or multiple threads within the same process can communicate through shared variables in memory that allow them to read or write to the same memory location in the computer. Message passing multiprocessor computers, in contrast, have processor nodes with a separate memory space for each processor, requiring processes in such a system to communicate through explicit messages to each other.
Shared memory multiprocessor computers may further be classified by how the memory is physically organized. In distributed shared memory (DSM) machines, the memory is divided into modules physically placed near each processor. Although all of the memory modules are globally accessible, a processor can access memory placed nearby faster than memory placed remotely. Because the memory access time differs based on memory location, distributed shared memory systems are also called non-uniform memory access (NUMA) machines. In centralized shared memory computers, on the other hand, the memory is physically in one location. Centralized shared memory computers are called uniform memory access (UMA) machines because the memory is equidistant in time from each of the processors. Both forms of memory organization typically use high-speed cache memory in conjunction with main memory to reduce execution time.
Multiprocessor systems with distributed shared memory are organized into processor nodes with one or more processors per node. Also included in the node are local memory for the processors, a remote cache for caching data obtained from other nodes, logic for linking the node with other nodes in the system, possibly logic for communicating with an I/O subsystem, and a node bus connecting these elements together. The processor nodes are linked with a system interconnect that provides high bandwidth and low latency. An example of such an interconnect is a switch-based network that uses the Scalable Coherent Interface (SCI) protocol. Other interconnection schemes are, of course, possible. Further information on multiprocessor computer systems in general and NUMA machines in particular can be found in a number of works including Computer Architecture: A Quantitative Approach (2nd Ed. 1996), by D. Patterson and J. Hennessy.
The conventional I/O subsystem of a multiprocessor system typically includes multiple I/O devices such as disk drives and tapes that are connected to one or more of the processor nodes. Dual I/O paths between nodes and the I/O devices are sometimes built into the I/O subsystem to make the system fault tolerant by ensuring that I/O requests are successfully handled in the event a primary signal path is blocked. This blocking might occur, for example, if a switch or other device within the subsystem fails or if a node or other device is taken off-line for replacement or maintenance. In such an event, a secondary path is used until the primary path is again available.
The problem with providing multiple paths in this manner is that it is system specific, requiring paths to be set for each contemplated system configuration. If a user wished to change a system configuration by adding an additional I/O device, for example, the operating system would have to be modified to recognize new primary and secondary paths to the new I/O device. Such modifications make it costly to configure multiprocessor systems to meet a user""s unique requirements.
An objective of the invention, therefore, is to provide a method for dynamically establishing an I/O path between a processor node and an I/O device in a computer system. Another objective of the invention is to establish an optimal I/O path between a processor node and an I/O device within the system, where multiple paths exist.
The invention includes a method for dynamically establishing an I/O path between a processor node and an I/O device in a multiprocessor system. The method starts with providing a configuration graph. The graph has objects associated with elements (devices) of the system and links connecting the objects. The node is identified, and links are then followed in the graph from an object associated with the I/O device to an object associated with a node. If multiple I/O paths exist and an optimal path is desired, the method includes providing in the links routing information containing nodes that can be directly reached via the link. Links are then followed, if possible, whose routing information contains the identified node. If no link having such routing information exists at an object along the path, then another link is chosen having routing information containing another node. This other link may be chosen in a round robin manner if there are multiple links to choose from. If multiple links having such routing information exist at an object along the path, then one of the links is chosen in a round robin manner.
Other aspects of the invention include constructing the configuration graph and adding routing information to the configuration graph.
The ability to establish an optimal path each time an I/O request is made provides fault tolerance to the computer system. If a subsystem device along one path between an I/O device and an intended node fails, the system can be quickly reconfigured and another path established that does not require the failed device.
Although the invention may have most value in multiprocessor systems with a NUMA architecture, it is not restricted to such. The method may also be used in uniprocessor systems such as a system with a one-processor node for simply establishing multiple paths to provide fault tolerance.
These and other aspects of the invention are more fully described below with reference to an illustrative embodiment.