1. Field of the Invention
The present invention relates to a technique for controlling, in a multiprocessor system, requests issued by CPUs using a system controller. More particularly, the present invention relates to a system controller, an identical-address-request-queuing preventing method, and an information processing apparatus that prevent, in retaining requests issued by CPUs in a queue, requests having identical addresses from being retained.
2. Description of the Related Art
In recent years, as information processing apparatuses have been widely used in various fields, a high processing ability of a chip set in a multiprocessor constitution is demanded.
FIG. 12 is a diagram showing an example of a structure of a multiprocessor system. The multiprocessor system shown in FIG. 12 includes four system boards 100a to 10d and an address crossbar board 200. The respective system boards 100a to 100d and the address crossbar board 200 are connected by local buses 300a to 300d, a global bus 301, local notification paths 302a to 302d, a global notification path 303, and the like.
The system board 100a includes a system controller 110 and four CPUs 120a to 120d. The CPU 120a and the CPU 120b are connected to the system controller 110 by a CPU bus 130a. The CPU 120c and the CPU 120d are connected to the system controller 110 by a CPU bus 130c. The system boards 100b to 100d have the same structure as the system board 100a. 
The CPUs 120a to 120d include cache memories 121a to 121d and cache tags 122a to 122d, respectively. In an embodiment of the invention, it is assumed that the cache memories 121 are controlled by a 4-Way set associative system.
The system controller 110 includes snoop tags 111a to 111d, CPU-issued request queues 112a and 112c, a local arbiter 113, a request handling section 114, and a request execution section 115.
The snoop tags 111a to 111d correspond to the cache tags 122a to 122d, respectively. The CPU-issued request queues 112 retain requests issued by the CPUs 120 for the CPU buses 130. Specifically, the CPU-issued request queue 112a retains requests issued by the CPUs 120a and 120b and the CPU-issued request queue 112c retains requests issued by the CPUs 120c and 120d. The local arbiter 113 outputs the requests retained by the CPU-issued request queues 112 to the local bus 300a. 
The request handling section 114 performs processing for a request sent from the global bus 301. The request handling section 114 includes a resource management section 116 and a request-execution activating section 117. The resource management section 116 checks resources for processing a request. The request-execution activating section 117 starts the request execution section 115 and updates the snoop tags 111.
The address crossbar board 200 includes a global arbiter 210 and an executability determination circuit 220. The global arbiter 210 outputs requests inputted from the local buses 300a to 300d to all the system boards 100a to 100d via the global bus 301. The executability determination circuit 220 determines executability, that is to say, propriety of execution of the request on the basis of notifications inputted from the local notification paths 302a to 302d and notifies the respective system boards 100a to 100d of a result of the determination and information necessary for execution of the requests via the global notification path 303.
Operations of the system shown in FIG. 12 will be explained by giving an example in which the CPU 120a performs a read request. The CPU 120a performs read for an address 1000. An MESI protocol is used for cache coherency. The MESI protocol is a type of a cache coherency protocol and controls respective lines of a cache by classifying the lines into states of M (modified state: Modified), E (exclusive state: Exclusive), S (shared state: Shared), and I (invalid state: Invalid).
In order to check whether data of the address 1000 is present in the cache memory 121a of the CPU 120a, first, the CPU 120a searches through the cache tag 122a. When it is determined as a result of the search that there is no valid data in the cache memory 121a, the CPU 120a issues a read request to the CPU bus 130a. 
The request issued by the CPU 120a is inputted to the global arbiter 210 via the CPU-issued request queue 112a, the local arbiter 113, and the local bus 300a. The request inputted to the global arbiter 210 is simultaneously notified to all the system boards 100a to 100d via the global bus 301.
In the system board 100a, the request is inputted to the request handling section 114 from the global bus 301. The request handling section 114 reads the respective snoop tags 111 and checks whether there are resources and the like for processing the request using the resource management section 116. A result of the check is sent to the executability determination circuit 220 via the local notification path 302a. 
The executability determination circuit 220 determines executability (propriety of execution) of the request on the basis of notifications from all the local notification paths 302a to 302d. The executability determination circuit 220 notifies the request-execution activating section 117 of a result of the determination and information necessary for execution of the request via the global notification path 303. The request-execution activating section 117 updates the snoop tags 111 and starts the request execution section 115 on the basis of the result of determination on propriety of execution of the request and the information necessary for execution of the request.
For example, when the resources can be secured and all results of searches through the snoop tags 11 indicate I (Invalid), the request-execution activating section 117 registers the address 1000 in the snoop tag 111a. A state of the registration depends on an issued request. At the same time, the request execution section 115 performs read for the address 1000 of a memory and sends data obtained by the read to the CPU 120a. The CPU 120a updates the cache tag 122a. 
For example, when the resources can be secured and, as a result of the searches through the snoop tags 111, the address 1000 is registered in the snoop tag 111c in the state of M (Modified), the request-execution activating section 117 registers the address 1000 in the snoop tag 111a. A state of the registration depends on an issued request. The request-execution activating section 117 changes the state of the address 1000 of the snoop tag 111c to S (Shared) or I (Invalid). A state to which the state of the address 1000 is changed in this case depends on an issued request. At the same time, the request execution section 115 instructs the CPU 120c to output M (Modified) data of the address 1000 and sends the outputted data to the CPU 120a. The CPU 120a updates the cache tag 122a. The CPU 120c updates the cache tag 122c. 
FIGS. 13A and 13B are diagrams for explaining an example of conventional prevention of queuing of requests having identical addresses. The CPU-issued request queue 112 performs prevention of queuing of requests having identical addresses to prevent the requests having identical addresses from being simultaneously retained. The example of the conventional prevention of queuing of requests having identical addresses in the CPU-issued request queue 112 will be hereinafter explained with reference to FIGS. 13A and 13B.
An issued request is a request issued by the CPU 120 and sent to the CPU-issued request queue 112 via the CPU bus 130. The issued request includes a group of signals such as a command (CMD), a cache line address (ADR0, ADR1, ADR2), and a CPUID. The cache line address is divided into three blocks ADR0, ADR1, and ADR2 and handled. The group of signals of the issued request shown in FIGS. 13A and 13B are signals necessary for the explanation among all signals included in the request.
A retained request is a request retained by each of entries of the CPU-issued request queue 112. The retained request includes a group of signals such as a valid signal (V) and a cache line address (ADR0, ADR1, ADR2). The cache line address is divided into three blocks ADR0, ADR1, and ADR2 and treated. The group of signals of the retained request shown in FIGS. 13A and 13B are signals necessary for the explanation among all signals included in the request.
Each of the entries of the CPU-issued request queue 112 includes a comparator 141 and an AND circuit 142. The AND circuit 142 of each of the entries is connected to an OR circuit 143.
In a method shown in FIG. 13A, in each of the entries of the CPU-issued request queue 112, the comparator 141 compares the cache line address (ADR0, ADR1, ADR2) of the issued request and the cache line address (ADR0, ADR1, ADR2) of the retained request. When both the cache line addresses match each other, the comparator 141 transmits a valid signal “1”. When both the cache line addresses do not match each other, the comparator 141 transmits an invalid signal “0”. When a signal transmitted from the comparator 141 is valid and the valid signal (V) of the retained request is valid, the AND circuit 142 transmits a valid signal. Otherwise, the AND circuit 142 transmits an invalid signal.
When a signal transmitted from the AND circuit 142 of any one of the entries of the CPU-issued request queue 112 is valid, the OR circuit 143 decides the issued request to be retried. In other words, when a cache line address coinciding with the cache line address (ADR0, ADR1, ADR2) of the issued request is present in the retained request of the CPU-issued request queue 112, queuing of the issued request in the CPU-issued request queue 112 is not performed.
In the method shown in FIG. 13A, it is logically possible to prevent queuing of a completely identical address. However, since an expansion of an address space and an increase in the number of entries of the CPU-issued request queue 112 are demanded in systems in recent years, enormous hardware is necessary in order to check match of the cache line address (ADR0, ADR1, ADR2). Further, since the number of logical stages also increases, it is difficult to realize an increase in speed of the systems. Therefore, in the systems in recent years, as shown in FIG. 13B, a method of checking match of only a part (ADR0) of the cache line address may be adopted.
In a method shown in FIG. 13B, in each of the entries of the CPU-issued request queue 112, the comparator 141 compares a part (ADR0) of the cache line address of the issued request and a part (ADR0) of the cache line address of the retained request. When both the parts of the cache line addresses match each other, the comparator 141 transmits a valid signal. When both the parts do not match each other, the comparator 141 transmits an invalid signal. When a signal transmitted from the comparator 141 is valid and the valid signal (V) of the retained request is valid, the AND circuit 142 transmits a valid signal. Otherwise, the AND circuit 142 transmits an invalid signal.
When a signal transmitted from the AND circuit 142 of any one of the entries of the CPU-issued request queue 112 is valid, the OR circuit 143 decides the issued request to be retried. In other words, when a cache line address, a part of which matches a part (ADR0) of the cache line address of the issued request, is present in the retained request of the CPU-issued request queue 112, queuing of the issued request in the CPU-issued request queue 112 is not performed.
As a prior art document in which a technique concerning a multiprocessor system is written, there is, for example, Patent Document 1 (National Publication of International Patent Application No. 2002-522827). In the prior art document, a technique concerning a multiple processor computer system in which respective nodes are coupled in a ring shape is described. However, it is impossible to solve problems described later using the technique described in the prior art document.
When there are plural identical addresses in the CPU-issued request queue 112, processing for a cache replace request is complicated. For example, when a read request A, a cache replace request B, and a read request B are issued in order by the identical CPU 120 and the cache replace request B and the read request B are simultaneously present in a queue, it is necessary to control the cache replace request B and the read request B to prevent overtaking from occurring. A complicated logic is necessary to perform this control while performing out-of-order processing for the read requests.
When a comparator for a full address is used as shown in FIG. 13A in order to prevent an identical address from being interposed in the CPU-issued request queue 112, a hardware quantity increases. The increase in the hardware quantity makes it difficult to actuate the hardware at a high-frequency clock.
In the method shown in FIG. 13B, in general, a cache index is used for a part (ADR0) of the cache line address. However, since a cache replace request and a read request serving as a parent of the cache replace request (i.e., a read request that makes it necessary to perform cache replace) have an identical index, in the method shown in FIG. 13B, the cache replace request is retried.