The present invention relates to a multiprocessor system in which processors share a main memory and which uses such a snoop scheme as to distribute an address of a requesting cache line to all the processors for coherency control and more particular, to a multiprocessor including nodes each of which has CPUs, a main memory and a cache memory unit and which has a structure capable of being used as a part common to both small and large systems, and also to a node controller in each node.
A prior art system, in which nodes each having CPUs and a main memory mounted on an identical board are commonly used to both small and large multiprocessor systems, is described in James Laudon, et al., System Overview of the SGI Origin 200/2000 Product Line. Proceeding of the 47th IEEE COMPUTER SOCIETY INTERNATIONAL CONFERENCE, pp. 150-156, February, 1997.
The origin 200/200 includes one or more nodes, each node having two CPUs, a main memory and a hub chip.
The hub chip has a communication interface controller with the CPUs, a communication interface controller with the main memory and directory, an external I/O interface controller, and a crossbar for coupling these interface controllers.
The Origin 200 corresponding to a small multiprocessor system includes usually one sheet of node board and in some cases, includes two sheets of node boards directly connected by an external I/O interface from the hub chip. The Origin 2000 corresponding to a large multiprocessor system includes two or more node boards mounted on the crossbar and connected by a rooter board. The Origin 200 and Origin 2000 will be referred to merely as the Origin without drawing a distinction therebetween, in the following description.
As in the Origin, various types of systems can be formed using a plurality of identical nodes regardless of the system size or scale. This is an effective means of reducing its development costs and shortening a development period of time.
The Origin also performs directory type cache coherency control with use of a (cache-coherent non-uniform memory access) ccNUMA type multiprocessor.
This control is already detailed in James laudon, et al, The SGI Origin: A ccNUMA Highly Scalable Server, Proceeding of 24th Annual Symposium on Computer Architecture, pp. 241-251, June, 1997.
The memory access of the Origin is carried out usually in the following manner. A memory access request issued from a CPU is transferred to a node having a main memory having a requesting address present therein to search the node for a directory. A directory, which is provided for each cache line corresponding to the requested address, records therein the directory is transferred to the cache memory of which node in what state.
As a result of searching for the directory, data read out from the cache memory of the found node or from the main memory is transferred to the CPU as the request issuance originator.
A prior art crossbar having an arbiter for determining a processing sequence of a memory access request for cache coherency control is shown in U.S. Pat. No. 6,011,791. It is generally already known that a crossbar performs data parallel transfer and is higher in throughput performance than a bus.
However, there is a possibility that the memory access sequence may be partially reversed to disturb the cache coherency.
The cache coherency is maintained by the directory system in the Origin; whereas, the cache coherency is controlled by uniquely sequencing memory access requests issued from CPUs in an arbiter which performs logically unique operation within the crossbar in such a multiprocessor system as shown in U.S. Pat. No. 6,011,791.
The sequenced requests are transmitted to the CPUs, main memory and I/O controller through a selector within the crossbar. A system for realizing snooping cache by providing the crossbar with a function of sequencing the memory access requests to broadcast the memory access requests to all the CPUs in this way, will be referred to as the multicasting system, hereinafter.
Further, a crossbar having a function of sequencing memory access requests to realize the multicasting system will be referred to as the multicasting crossbar, hereinafter.
However, the multiprocessor system disclosed in the U.S. Pat. No. 6,011,791 is not arranged to be able to form a small system by using a small number of nodes used only in a large system and directly connecting these nodes as in the Origin.
In the case of the multiprocessor system for performing the aforementioned directory type cache coherency control, in general, there is a problem that the frequency of transmission between LSIs is increased by the frequency of directory reference, thus increasing the memory latency.
An increase in the amount or size of main memory also causes an increase in the amount of directory. Accordingly a large capacity of main memory to be mounted requires a large capacity of directory memory, thus disadvantageously increasing involving high costs.
It is therefore an object of the present invention to provide a node controller which can eliminate the above problems in the prior art and also to provide a multiprocessor system of a main-memory shared type using such a node controller.
Another object of the present invention is to provide a node controller which can use nodes common to both a small multiprocessor system having a small number of nodes and a large multiprocessor system having a large number of nodes using a crossbar, and also to provide a multiprocessor system of a main memory shared type which uses such a node controller.
A further object of the present invention is to provide a node controller which can reduce development costs by using nodes common to both a small system having a small number of nodes and a large system having a large number of nodes as in the aforementioned Origin, and also to provide a multiprocessor system of a main memory shared type using such a node controller.
Yet another object of the present invention is to provide a node controller which allows a plurality of nodes to be directly connected to form a small system and can omit an external crossbar, and also to provide a multiprocessor system of a main memory shared type using such a node controller.
A still further object of the present invention is to provide a node controller which can reduce an increase in memory latency caused by an overhead of directory reference by employing a cache coherency control system not using directory and can avoid increase of costs of devices other than a main memory even when the size of the main memory is increased, and also to provide a multiprocessor system of a main memory shared type using such a node controller.
In accordance with an aspect of the present invention, in order to attain the above objects, there is provided a multiprocessor system of a main memory shared type having a plurality of nodes mutually connected by signal lines, each of the plurality of nodes including:
a CPU having a cache memory;
a main memory; and
a node controller for performing communication control between the CPU, main memory and the other nodes than its own node,
the node controller having:
a communication controller for controlling communication interface between the plurality of nodes;
a crossbar for determining a processing sequence of memory access requests to the main memories in the plurality of nodes issued from at least one of the plurality of nodes; and
a crossbar controller means for validating or invalidating the crossbar.
With such an arrangement, nodes having common structures can be used in both a small multiprocessor system having a small number of nodes and a large multiprocessor system having a large number of nodes, thus eliminating the need for developing nodes differently for the small and large multiprocessor systems and reducing its development costs. In this case, the crossbar for determining the processing sequence of memory access requests is only required to be validated, while the crossbars for not determining the sequence are only required to be invalidated. Further, by using a cache coherence control system not using directory, an increase in memory latency caused by an overhead of directory reference can be reduced and an increase in costs of components other than the main memory can be prevented even when the size of the main memory is increased.
In an example of the present invention, the crossbar control means in one of the plurality of nodes validates the crossbar of the one node while the crossbar control means in the other remaining nodes invalidate the crossbars of the other remaining nodes, whereby the crossbar of the one node determines a processing sequence of memory access requests to the main memories in the plurality of nodes issued from the plurality of nodes.
In another example of the present invention, the node controller has:
a communication controller for controlling communication interface between a plurality of nodes;
a crossbar for determining a processing sequence of ones of memory access requests to the main memories in the plurality of nodes issued from at least one of the plurality of nodes to be directed to the main memory of its own node having the node controller therein; and
a crossbar controller means for validating or invalidating the crossbar.
In a further example of the present invention, the crossbar has:
means for judging whether or not a memory access request issued from at least one of a plurality of nodes is one to be directed to the main memory within any of the plurality of nodes;
means for determining a processing sequence of the memory access requests judged by the judging means that the memory access requests are ones to be directed to the main memory in its own node having the crossbar therein; and
means for transferring only to the other one node the memory access requests judged by the judging means that the memory access requests are ones to be directed to the main memory of the other one node other than the own node,
and the crossbar controller means in each of the plurality of nodes validates the crossbar of own node having the crossbar controller means therein.
In yet another example of the present invention, the crossbar has:
means for judging whether or not a memory access request issued from at least one of the plurality of nodes is one to be directed to the main memory in own one of the plurality of nodes having the crossbar therein;
means for determining a processing sequence of the memory access requests judged by the judging means that the memory access request are ones to be directed to the main memory in the own node; and
means for transferring to the all nodes other than the own node having crossbar therein the memory access requests judged by the judging means that the memory access requests are not ones to be directed to the main memory in the own node,
and the crossbar controller means in each of the plurality of nodes validates the crossbar of its own node having the crossbar controller means therein.
In accordance with another aspect of the present invention, there is provided a multiprocessor system of a main memory shared type having a plurality of nodes mutually connected by signal lines, each of the plurality of nodes including:
a CPU having a cache memory;
a main memory;
a node controller for performing communication control between the CPU, main memory and the other nodes than its own node;
the node controller having:
a communication controller for controlling communication interface between the plurality of nodes;
a crossbar for determining a processing sequence of memory access requests issued from at least one of the plurality of nodes to be directed to the main memory of its own node having the crossbar therein; and
a crossbar controller means for validating or invalidating the crossbar,
the multiprocessor system of the main memory shared type further including an external crossbar connected to each of the plurality of nodes for determining a processing sequence of memory access requests issued from at least one of the plurality of nodes to be directed to the main memories in the plurality of nodes.
In a large multiprocessor system, therefore, nodes for use in a small multiprocessor system and a plurality of nodes having the same structure as these nodes are connected to the external crossbar so that the crossbar can determine a processing sequence of memory access requests issued from all the nodes, In this case, it is only required to invalidate the crossbars in all the nodes.
In accordance with a further aspect of the present invention, there is provided a node controller which has:
a crossbar for determining a processing sequence of memory access requests issued from its own node having the crossbar therein and from at least one of the other nodes to be directed to the main memory of at least the own node; and
means for validating or invalidating the crossbar.
When the node controller having such a structure is used, the node controllers having the structure common to both small and large multiprocessor systems can be employed, thus eliminating the need for developing the node controllers differently for the small and large multiprocessor systems and leading to reduction of development costs. That is, a small multiprocessor system is formed by directly connecting a plurality of nodes each having such a node controller and eliminating the need for provision of an external crossbar. A large multiprocessor system is formed by connecting a plurality of nodes each having such a node controller to an external crossbar to cause the external crossbar to determine a processing sequence of memory access requests issued from all the nodes. In this case, it is only required to invalidate the crossbars within all the nodes.