1. Field of the Invention
The present invention relates to an apparatus for controlling a multi-processor system, designed to improve the performance of the snoop process carried out by the multi-processor system. The invention relates also to a scalable node, a scalable multi-processor system, and a method of controlling a multi-processor system.
2. Description of the Related Art
FIGS. 9A and 9B are block diagrams showing an example of the configuration of a conventional scalable multi-processor system. The conventional scalable multi-processor system has a plurality of nodes 101. The nodes 101 are scalable nodes that can be connected directly or indirectly to one another. The nodes 101 may be directly connected as shown in FIG. 9A. Alternatively, the nodes 101 may be indirectly connected as shown in FIG. 9B, by cross bars (XBs) 2.
FIG. 10 is a block diagram depicting an example of the conventional node configuration. In the conventional scalable multi-processor system, each node 101 comprises central processing units (CPUs) 3, an input/output (I/O) unit 4, a system controller (SC) 105, a memory access controller (MAC) 6, and a main memory 7. In the node 101, known as “local node,” the SC 105 is connected to the CPUs 3, the IO 4 and the MAC 6, and also to the SCs of the other nodes or the XBs 2. The SC 105 has a snoop process unit 112 that performs a snoop process. The MAC 6 is connected to the main memory 7.
The nodes 101 share one memory by means of cache coherent non-uniform memory access (CC-NUMA). How the CC-NUMA operates will be briefly described. When any CPU 3 or the IO 4 issues a request for data, the snoop process unit 112 performs a snoop process to determine whether the data desired is stored in the caches of the other CPUs 3 or in the main memory 7. This process is called “local snoop”.
If it is determined, in the local snoop, that the data desired is not in the local node, or if the data is not supplied due to busy state, the snoop process unit 112 broadcasts the request to all nodes 101. The snoop process is therefore performed on all nodes at the same time. This process is called “global snoop”.
Prior art related to the present invention is disclosed in, for example, Jpn. Pat. Appln. Laid-Open Publication No. 7-28748 (see pages 3 and 4, and FIG. 1).
In the above-mentioned global snoop, the address field that the request should access may be busy because a preceding request has already accessed it. In this case, the global snoop is repeatedly retried until the preceding request ceases to access the address field. Thus, the global snoop takes a long time when requests concentrate on a particular address field.
The queue waiting for the global snoop is limited. If the queue has reached its limit, the following requests cannot be broadcast. Thus, once the queue for the global snoop has reached its limit, any request is kept waiting even if the address field to be accessed is not busy. In other words, the process on an address field on which accesses do not concentrate is delayed due to the process on any other address field on which accesses concentrate.