The present invention relates generally to multi-processor computer systems and more particularly to system control units.
High performance, multi-processor systems with a large number of microprocessors are built by interconnecting a number of node structures, each node containing a subset of the processors and memory in the system. While the memory in the system is distributed, several of these systems support a shared memory abstraction where all the memory in the system appears as a large memory common to all processors in the system. To support high-performance, these systems typically allow processors to maintain copies of memory data in their local caches. Since multiple processors can cache the same data, these systems must incorporate a cache coherence mechanism to keep the copies coherent.
In some cache-coherent systems, each line of memory (typically a portion of memory tens of bytes in size) is assigned a home node, which manages the sharing of that memory line, and guarantees its coherence. The home node maintains a directory, which identifies the nodes that possess a copy of the memory line. When a node requires a copy of the memory line, it requests the memory line from the home node. The home node supplies the data from memory if memory has the latest data. If another node has the latest copy of the data, the home node directs this node to forward the data to the requesting node. The home node employs a coherence protocol to ensure that when a node writes a new value to the memory line, all other nodes see this latest value. Coherence controllers implement this coherence functionality. First, they implement a coherence controller for each memory unit, which maintains coherence of all memory lines in that memory unit. Second, the functionality of the coherence controller is integrated with the functionality of the System Control Unit (SCU) of the associated node.
The SCU provides the control and the path for data movement for the following sources and destinations within the node:
(a) the microprocessors within the node;
(b) the local (node) portion of the memory system;
(c) the network connecting all of the nodes of the multi-processor system; and
(d) the input/output (I/O) system of the local node.
The above requires an interconnection network that is efficient in carrying control information and data.
The SCU includes logic for determining a desired destination from a message header, and for appropriately routing all of the parallel bits of a transmission; e.g., 64 bits in parallel for a 64 bit processor. However, this presents inherent scalability problems. For example, a typical SCU might service four processors in parallel, and route 64 bits to one of the four processors; such a system could not readily be reconfigured to handle 128 bits in parallel to support higher-performance systems. Further, such a solution would be at the edge of integrated circuit and system packaging technologies.
In addition, current bus-based DSM multi-processor systems require passage of signals through the interconnection network crossbar switches and it is desirable to find a better way of providing point-to-point communication links between the SCU, the processors within a node, and the local memory section.
Thus, a method or architecture has been long sought and long eluded those skilled in the art, which would be scalable and re-configurable while having low latency.
The present invention provides a distributed shared memory multi-processor system which includes a System Control Unit (SCU) made up of a system control unit address section (SCUA) and system control unit data sections (SCUDs). The SCU is scalable by dividing the control and data flow functions of the SCU, and then parallelizing the data path. This allows the number of processors in the system to be increased or higher performance processors to be added by increasing the number of SCUDs and reprogramming crossbar switches incorporated in the SCUA and SCUDs. This enables implementation of the SCU function without pushing the limits of integrated circuit and system packaging technologies.
The present invention also provides point-to-point communication links among the SCU, the processors within the node, and the local memory section of the DSM multi-processor system via control and data crossbar switches contained within the SCU.
The present invention further provides a point-to-point, non-blocking communication link between nodes which significantly improves the overall system performance of the DSM multi-processor system over similar prior art bus based systems.
The present invention still further provides a SCU with easily added multiple signal ports for connection to the interconnection network which enhances the reliability and high-availability of the multi-processor system.
The above and additional advantages of the present invention will become apparent to those skilled in the art from a reading of the following detailed description when taken in conjunction with the accompanying drawings.