Distributed shared memory (DSM) multiprocessor systems are typically designed with either a special interconnect for input/output traffic or a special bridge that connects input/output modules directly or indirectly to the primary interconnect of the system. Input/output modules may be integrated into each node of the system and thus may use the node structures to ultimately reach the system interconnect when necessary.
The result is that the input/output configuration in conventional DSM systems is typically very limited and inflexible. Designs based on a special input/output traffic interconnect suffer from redundant resources (one interconnect for input/output and one for data), which may be underutilized. Designs based on special bridges typically treat each input/output module as subordinate to a processor, and so the number of input/output modules can not be increased without a corresponding, costly increase in the number of processors.
It would be desirable to have a multi-node multiprocessor computer system that is more flexible with respect to input/output configurations, preferably allowing the number of input/output modules to be increased or decreased without regard to the number of processors in the system, while also unifying the data and input/output interconnects.
If would furthermore be desirable to have a multiprocessor system in which input/output modules and processors are treated as being participants in the general system interconnect and in the cache coherence mechanism. In such a system there would be need for a special input/output interconnect or bridge, because the input/output modules would communicate data requests and data modifications to the rest of the system via the normal protocol of the cache coherence mechanism. Such a system would furthermore be capable of being configured with an arbitrary ratio of input/output modules and processors.
In summary, a computer system has a plurality of processor nodes and a plurality of input/output nodes. Each processor node includes a one or more processor cores, an interface to a local memory subsystem and a protocol engine implementing a predefined cache coherence protocol. Each processor core has an associated memory cache for caching memory lines of information. Each input/output node includes no processor cores, an input/output interface for interfacing to an input/output bus or input/output device, a memory cache for caching memory lines of information and an interface to a local memory subsystem. The local memory subsystem of each processor node and input/output node stores a multiplicity of memory lines of information. The protocol engine of each processor node and input/output node implements the same predefined cache coherence protocol.
In another aspect of the invention, the protocol engine of each of the processor nodes enables the processor cores therein to access memory lines of information stored in the local memory subsystem and memory lines of information stored in the memory cache of any of the processor nodes and input/output nodes, and maintains cache coherence between memory lines of information cached in the memory caches of the processor nodes and memory lines of information cached in the memory caches of the input/output nodes. Similarly the protocol engine of each of the input/output nodes enables an input/output device coupled to the input/output interface of the input/output node to access memory lines of information stored in the local memory subsystem and memory lines of information stored in the memory cache of any of the processor nodes and input/output nodes, and maintains cache coherence between memory lines of information cached in the memory caches of the processor nodes and memory lines of information cached in the memory caches of the input/output nodes.
In another aspect of the invention, the system is reconfigurable so as to include any ratio of processor node to input/output nodes so long as a total number of processor nodes and input/output nodes does not exceed a predefined maximum number of nodes.
In yet another aspect of the invention, the protocol engine of each of the processor nodes is functionally identical to the protocol engine of each of the input/output nodes.