1. Field of the Invention
The present invention relates to a technique for controlling requests from CPUs in a multiprocessor system. More particularly, the present invention relates to a multiprocessor system, a system board, and a cache replacement request processing method for efficiently handling cache replacement requests.
2. Description of the Related Art
In recent years, with wide utilization of information processing apparatuses in various fields, increasingly high processing capability is required of chip sets in a multiprocessor configuration.
FIG. 11 shows an exemplary configuration of a multiprocessor system. The multiprocessor system shown in FIG. 11 is composed of four system boards 100a to 100d and an address crossbar board 200. Each of the system boards 100a to 100d and the address crossbar board 200 are connected to each other by local buses 300a to 300d, a global bus 301, local signaling paths 302a to 302d, and a global signaling path 303.
The system board 100a has a system controller 110 and four CPUs 120a to 120d. The CPU 120a and the CPU 120b are connected to the system controller 110 by a CPU bus 130a, and the CPU 120c and the CPU 120d by a CPU bus 130c. Other system boards 100b to 100d have a similar configuration as the system board 100a. 
The CPUs 120a to 120d have cache memory 121a to 121d and a cache tag 122a to 122d, respectively. This example assumes that the cache memory 121 is controlled in 4-way set associative method.
The system controller 110 has snoop tags 111a to 111d, a CPU-issued request queues 112a and 112c, a local arbiter 113, a request handling section 114, and a request execution section 115.
The snoop tags 111a to 111d correspond to the cache tags 122a to 122d, respectively. The CPU-issued request queue 112 retains requests issued by the CPUs 120 for each of the CPU buses 130. Here, requests issued by the CPUs 120a and 120b are retained in the CPU-issued request queue 112a and ones issued by the CPUs 120c and 120d are retained in the CPU-issued request queue 112c. The local arbiter 113 outputs requests retained in the CPU-issued request queues 112 to the local bus 300a. 
The request handling section 114 performs handling of requests sent from the global bus 301. The request handling section 114 has a resource management section 116 and a request execution activating section 117. The resource management section 116 performs checking of resource and the like for handling requests. The request execution activating section 117 activates the request execution section 115 and/or updates the snoop tags 111.
The address crossbar board 200 has a global arbiter 210 and an executability determination circuit 220. The global arbiter 210 outputs requests input from the local buses 300a to 300d to all the system boards 100a to 100d via the global bus 301. The executability determination circuit 220 determines whether it is possible to execute a request based on a notification inputted from the local signaling paths 302a to 302d and notifies the result and information necessary for executing the request to each of the system boards 100a to 100d via the global signaling path 303.
The operation of the system illustrated in FIG. 11 will be described for a case where the CPU 120a makes a read request. It is assumed here that the CPU 120a performs read from an address of 1000. The MESI protocol is used for cache coherency. The MESI protocol is a kind of cache coherency protocol which controls each line of cache by classifying it as one of M (modified state: Modified), E (exclusive state: Exclusive), S (shared state: Shared), and I (invalid state: Invalid).
To confirm whether data from the address of 1000 is present in its own cache memory 121a, the CPU 120a first searches the cache tag 122a. If it determines from the search that there is no valid data in its own cache memory 121a, the CPU 120a issues a read request onto the CPU bus 130a. 
The request issued by the CPU 120a is input to the global arbiter 210 by way of the CPU-issued request queue 112a, the local arbiter 113, and the local bus 300a. The request input to the global arbiter 210 is notified to all the system boards 100a to 100d simultaneously via the global bus 301.
On the system board 110a, the request is input to the request handling section 114 from the global bus 301. The request handling section 114 reads each snoop tag 111 and the resource management section 116 checks if there is resource and the like for handling the request. The result is sent to the executability determination circuit 220 via the local signaling path 302a. 
The executability determination circuit 220 determines whether it is possible to execute the request based on notifications from all the local signaling paths 302a to 302d and notifies the result and information necessary for execution of the request to the request execution activating section 117 via the global signaling path 303. The request execution activating section 117 updates the snoop tags 111 and/or activates the request execution section 115 based on the result of determining whether the request can be executed and the information necessary for executing the request.
For example, if resource can be secured and results of searching the snoop tags 111 are all Invalid, the request execution activating section 117 registers address of 1000 in the snoop tag 111a. A state to be registered depends on the issued request. At the same time, the request execution section 115 performs a read from the address of 1000 in memory and sends obtained data to the CPU 120a. The CPU 120a updates the cache tag 122a. 
Or, if resource can be secured and searches of the snoop tags 111 show that address of 1000 is registered in the snoop tag 111c as M (Modified), for example, the request execution activating section 117 registers address of 1000 in the snoop tag 111a. A state to be registered depends on the issued request. Also, the state of address of 1000 in the snoop tag 111c is changed to S (Shared) or I (Invalid). The state after the change depends on the issued request. At the same time, the request execution section 115 gives an instruction to the CPU 120c to send M data at address of 1000, and sends the output data to the CPU 120a. The CPU 120a updates the cache tag 122a and the CPU 120c updates the cache tag 122c. 
FIG. 12 illustrates an example of handling a cache replacement request. FIG. 12 illustrates a case which gives attention to the system board 100a that mainly includes the CPU 120a, CPU 120b and system controller 110 as well as the address crossbar board 200 that mainly includes the global arbiter 210 in FIG. 11. It is assumed here that the CPU 120a issues a cache replacement request.
Arrows of thick lines in FIG. 12 indicate routes by which a cache replacement request is delivered. A cache replacement request issued by the CPU 120a is delivered to the request handling section 114 of the system controller 110 on each of the system boards 100a to 100d by way of the CPU bus 130a, the CPU-issued request queue 112a, the local arbiter 113, the local bus 300a, the global arbiter 210, and the global bus 310, as with other requests.
FIG. 13 illustrates Eviction. Eviction refers to an instruction to discharge contents in cache given from the system controller 110 side to the CPU 120. Eviction will be described below with reference to the example shown in FIG. 13.
First, as illustrated at phase 0, assume that a certain common index of both the cache tag 122 and the snoop tag 111 is empty. If the CPU 120 reads address A in this state, address A is registered to both the cache tag 122 and the snoop tag 111 as illustrated at phase 1. As the CPU 120 further continues to read addresses B, C and D of the same index, it results in way-full as illustrated at phase 2.
If the CPU 120 further wants to read address E of the same index in this way-full state, any one piece of data in the cache tag 122 has to be deleted first. Assume that address A is deleted here. If the CPU 120 does not support cache replacement requests and the state of address A is not M (Modified), silent drop of address A occurs in the CPU 120. As illustrated at phase 3, address A in the cache tag 122 is deleted. Silent drop means deletion of data without notifying it externally.
Similarly, because the system controller 110 also should register address E in response to the read of address E by the CPU 120, any one piece of data has to be deleted from the snoop tag 111. It is assumed here that address B is deleted. At this point, because of a rule of inclusion that “what is present in the cache tag 122 must be present in the snoop tag 111”, the system controller 110 should have the CPU 120 delete what it deleted, so the system controller 110 has to issue an unload request to the CPU 120 as illustrated at phase 4. This unload request is called Eviction. As both the cache tag 122 and the snoop tag 111 will finally have an available space, address E can be registered in both of them as illustrated at phase 5.
Documents on conventional arts that describe techniques associated with multiprocessor systems include Patent Document 1 (National Publication of International Patent Application No. 2002-522827), for example. Patent Document 1 describes a technique for a multiprocessor computer system in which nodes are coupled to each other in a ring. However, the technique described in Patent Document 1 cannot solve problems mentioned below.
A multiprocessor system consisting of a number of system boards 100 has a problem that very heavy burden is placed on the global bus 301 and, when a cache replacement request is flowed on the global bus 301, the capability to handle other requests reduces. Another problem is that, because a cache replacement request could lead to occurrence of Eviction if it is not handled before a read request which is its parent, intended effect of the cache replacement request may not be obtained in such a situation.