A computer system can be broken into three basic blocks: a central processing unit (CPU), memory, and input/output (I/O) units. These blocks are interconnected by means of a bus. An input device such as a keyboard, mouse, disk drive, analog-to-digital converter, etc., is used to input instructions and data to the computer system via the I/O unit. These instructions and data can be stored in memory. The CPU retrieves the data stored in the memory and processes the data as directed by the stored instructions. The results can be stored back into memory or outputted via the I/O unit to an output device such as a printer, cathode-ray tube (CRT) display, digital-to-analog converter, LCD, etc.
In some computer systems multiple processors are utilized. Multiprocessor computers by definition contain multiple processors that can execute multiple parts of a computer program or multiple programs simultaneously. In general, this parallel computing executes computer programs faster than conventional single processor computers, such as personal computers (PCS) which execute the parts of a program sequentially. The actual performance advantage is a function of a number of factors, including the degree to which parts of a program can be executed in parallel and the architecture of the particular multiprocessor computer at hand.
Multiprocessor computers may be classified by how they share information among the processors. Shared-memory multiprocessor computers offer a common memory address space that all processors can access. Processes within a program communicate through shared variables in memory that allow them to read or write to the same memory location in the computer. Message passing multiprocessor computers, on the other hand, have a separate memory space for each processor. Processes communicate through messages to each other.
Shared-memory multiprocessor computers may also be classified by how the memory is physically organized. In distributed shared-memory computers, the memory is divided into modules physically placed near a group of processors. Although all of the memory modules are globally accessible, a processor can access memory placed nearby faster than memory placed remotely.
Multiprocessor computers with distributed shared-memory are often organized into nodes with one or more processors per node. Such nodes are also referred to herein as “processing modules.” The processing modules interface with each other through a network by using a protocol. Companies, like Intel Corporation, have developed “chip sets” which may be located on each node to provide memory and I/O buses for the multiprocessor computers.
In some conventional distributed shared-memory multiprocessor system, input/output (I/O) modules are directly connected to the processing modules by a point-to-point bus. FIG. 1 is a block diagram of a processing module 102 coupled to one or more I/O modules 104 in a conventional multiprocessor system. The processing module 102 comprises one or more processors 108, a memory 110, and a memory controller 112. The memory controller 112 directs traffic between a system bus, one or more point-to-point buses 106 and the shared memory 110. The memory controller 112 accepts access requests from the system bus and directs those access requests to memory 110 or to one of the point-to-point buses 106. The memory controller 112 also accepts inbound requests from the point-to-point buses 106. As further shown in FIG. 1, each one of the I/O modules 104 comprises an I/O controller 114 and one or more I/O devices 116. Connections between the processing module 102 and the I/O modules 104 are via a point-to-point bus 106.
However, a conventional multiprocessor system, such as the multiprocessor system shown in FIG. 1, has limited bandwidth to I/O devices. The need for high bandwidth connections to I/O devices in a distributed shared-memory system is increasing for many applications.