Generally, a large-scale information processing apparatus that includes CPUs and Input/Output (I/O) devices (a large-scale SMP (Symmetric Multiple Processor) information processing device, for example) has system boards each including CPUs with cache memories, a system controller, and I/O devices, so as to improve the processing capacity.
In such a large-scale information processing apparatus, a control operation is performed to guarantee cache coherency between the system boards (a coherence control operation). Therefore, request broadcasting and snoop result exchanges are performed between the system controllers of the respective system boards (see JP-A 2006-72509 and JP-A 2006-202215, for example).
In a large-scale information processing apparatus, however, the physical distance between the system controllers is longer, as the device size is larger. Where the apparatus structure is expanded, the latency of each memory access becomes longer, and it becomes difficult to improve the performance of the entire information processing apparatus. Also, as a larger number of I/O devices are provided in the information processing apparatus, the number of snoop requests becomes greater accordingly. As a result, it also becomes difficult to secure reasonable throughput of the broadcast bus and each snoop control unit.
By a known technique developed to counter the above problems, the access latency is shortened by skipping the snoop operation over the system boards and performing a data communication between the CPUs in the local system board, when the data at a target address is present in a cache memory in one of the CPUs provided on the same system board.
FIG. 12 is a block diagram illustrating the structure of a conventional large-scale information processing apparatus 100. As illustrated in FIG. 12, the conventional large-scale information processing apparatus 100 includes system boards (nodes) A and B (two system boards in this example). The system board A includes CPUs 10 and 11, I/O devices 20 and 21, and main memories 30 and 31. The system board B includes CPUs 12 and 13, I/O devices 22 and 23, and main memories 32 and 33.
Each of the CPUs 10 to 13 includes multilevel cache memories (two levels in this example). More specifically, the CPU 10 includes a level-1 cache memory 10a and a level-2 cache memory 10b, and the CPU 11 includes a level-1 cache memory 11a and a level-2 cache memory 11b. Likewise, the CPU 12 includes a level-1 cache memory 12a and a level-2 cache memory 12b, and the CPU 13 includes a level-1 cache memory 13a and a level-2 cache memory 13b. 
The system board A further includes a system controller 40-1 that performs communication control on the memories (the level-1 cache memories 10a and 11a, the level-2 cache memories 10b and 11b, and the main memories 30 and 31 in this example) provided in the system board A. Likewise, the system board B further includes a system controller 40-2 that performs communication control on the memories (the level-1 cache memories 12a and 13a, the level-2 cache memories 12b and 13b, and the main memories 32 and 33 in this example) provided in the system board B.
With this arrangement, the system controllers 40-1 and 40-2 share the communication control on the memories (the level-1 cache memories 10a to 13a, the level-2 cache memories 10b to 13b, and the main memories 30 to 33 in this example) provided in the information processing apparatus 100. Also, the system controller 40-1 and the system controller 40-2 have the same structure, except that the system controllers 40-1 and 40-2 perform the communication control on different memories. The system controller 40-1 and the system controller 40-2 are connected in such a manner that the system controllers 40-1 and 40-2 can communicate with each other.
The system controller 40-1 includes a cache TAG 46-1, a request transmission/reception unit 41-1, a local snoop control unit 42-1, a broadcast control unit 43-1, a global snoop control unit 44-1, and a memory access issuing unit 45-1.
The cache TAG 46-1 registers and holds specific address information for identifying data (cache data) cached in the cache memories (the level-1 cache memories 10a and 11a, and the level-2 cache memories 10b and 11b in this example) in the local node (the system board A in this example).
The request transmission/reception unit 41-1 receives a memory access request to access a main memory (also referred to herein as a local memory).
More specifically, in a case where a memory access request is generated from the CPU 10, and the data to be retrieved in response to the memory access request is not found in the level-1 cache memory 10a or the level-2 cache memory 10b, the request transmission/reception unit 41-1 receives the memory access request (a read request) from the CPU 10. The request transmission/reception unit 41-1 then transmits the received memory access request to the local snoop control unit 42-1 described below. The request transmission/reception unit 41-1 then receives a global snoop request from the local snoop control unit 42-1 described later, and transmits the global snoop request to the broadcast control unit 43-1 described later. The global snoop request is issued to search all the cache memories (the level-1 cache memories 10a to 13a and the level-2 cache memories 10b to 13b in the example) provided in the information processing apparatus 100 for the data to be accessed in response to the memory access request (hereinafter referred to simply as the target data).
The local snoop control unit 42-1 searches the cache memories in the local node for the target data of the memory access request, and, based on the search result, determines an operation to be performed in response to the memory access request.
More specifically, when receiving the memory access request from the request transmission/reception unit 41-1, the local snoop control unit 42-1 performs an operation in response to the CPUs 10 to 13 that have issued the memory access request, by searching (snooping) the cache TAG 46-1 in the local node for the access target address information (hereinafter referred to simply as the target address information) for identifying the target data of the memory access request.
In a case where there is a hit for the memory access request in the cache TAG 46-1 in the local node as a result of the search, for example, the local snoop control unit 42-1 determines an operation in response to the memory access request, based on the search result. The operation to be performed in response to the memory access request is to issue a read request to read data in a main memory, to issue a purge request to a CPU to purge data in a cache memory, or the like. In a case where there is a miss for the memory access request in the cache TAG 46-1 in the local node as a result of the search, for example, the local snoop control unit 42-1 cancels the local snoop control operation, and transmits a global snoop request to the request transmission/reception unit 41-1.
The broadcast control unit 43-1 transmits and receives global snoop requests to and from the request transmission/reception unit 41-1 of the local node, and also transmits and receives global snoop requests to and from the system controller 40-2 of the other node (the system board B in this example).
More specifically, when receiving a global snoop request from the request transmission/reception unit 41-1, the broadcast control unit 43-1 transmits the global snoop request to the global snoop control unit 44-1 described later, and outputs (broadcasts) the global snoop request to the system controller 40-2 of the other node. When receiving a global snoop request from the system controller 40-2 of the other node, the broadcast control unit 43-1 transmits the global snoop request to the global snoop control unit 44-1.
The global snoop control unit 44-1 searches a cache memory in the local node for the target data, and exchanges search results with the system controller 40-2 in the other node. Based on the search result in the system controller 40-2 in the other node and its own search result, the global snoop control unit 44-1 determines an operation to be performed in response to the memory access request.
More specifically, when receiving a global snoop request from the broadcast control unit 43-1, the global snoop control unit 44-1 searches the cache TAG 46-1 in the local node for the target address information corresponding to the target data of the global snoop request, as an operation in response to the CPU that has issued the memory access request.
Meanwhile, when the global snoop control unit 44-2 of the other node receives a global snoop request from the broadcast control unit 43-1 of the local node via the broadcast control unit 43-2 of the other node, the global snoop control unit 44-2 searches the cache TAG 46-2 in the other node for the target address information corresponding to the target data of the global snoop request. After that, the global snoop control units 44-1 and 44-2 exchange and combine the cache TAG search results (the result of the search on the cache TAG 46-1 conducted by the global snoop control unit 44-1, and the result of the search on the cache TAG 46-2 conducted by the global snoop control unit 44-2), so as to merge the cache statuses. Based on the result of the cache status merging, the global snoop control unit 44-1 of the local node determines an operation to be performed in response to the memory access request.
For example, in a case where it becomes clear as a result of the merging of the cache statuses that the target data of the memory access request issued from the CPU 10a is present in the main memory 30 in the local node, the global snoop control unit 44-1 issues a memory access request to the memory access issuing unit 45-1 in the local node. In a case where it becomes clear as a result of the merging of the cache statuses that the target data of the memory access request issued from the CPU 10a is present in the cache memory 12a in the CPU 12 in the other node, the global snoop control unit 44-1 issues a memory access request to the CPU 12a in the other node.
The memory access issuing unit 45-1 executes a memory access request, based on an operation in response to a memory access request determined by the local snoop control unit 42-1 or the global snoop control unit 44-1.
The cache TAG 46-2, the request transmission/reception unit 41-2, the local snoop control unit 42-2, the broadcast control unit 43-2, the global snoop control unit 44-2, and the memory access issuing unit 45-2 provided in the system controller 40-2 are the same as the cache TAG 46-1, the request transmission/reception unit 41-1, the local snoop control unit 42-1, the broadcast control unit 43-1, the global snoop control unit 44-1, and the memory access issuing unit 45-1 of the system controller 40-1, respectively, except that the communication control operations are to be performed with respect to the level-1 cache memories 12a and 13a, the level-2 cache memories 12b and 13b, and the main memories 32 and 33.
FIGS. 13 and 14 are timing charts for explaining operations of the conventional large-scale information processing apparatus 100.
The following is a description of an operation flow to be performed to access data that is present only in a local (main) memory and is not present in any of the cache memories provided in the conventional large-scale information processing apparatus 100.
As illustrated in FIG. 13, a memory access request (a data fetch request (illustrated as “FCH-REQ” in FIG. 13) in this example; hereinafter referred to as the fetch request) is first issued from the CPU 10 (see t1), and the request transmission/reception unit 41-1 receives the fetch request from the CPU 10 (see t2). The local snoop control unit 42-1 then searches the cache TAG 46-1 in the local node for the target address information of the fetch request (see t3). In FIGS. 13 and 14, the “LPIPE” indicates processing by the local snoop control unit, whereas the “GPIPE” indicates processing by the global snoop control unit.
If the result of the search conducted in response to the memory access request indicates a miss in the cache TAG 46-1 in the local node (indicated as “result=MISS” in FIG. 13), the local snoop control unit 42-1 cancels the local snoop control, and transmits a global snoop request to the request transmission/reception unit 41-1. When receiving the global snoop request from the local snoop control unit 42-1 via the request transmission/reception unit 41-1, the broadcast control unit 43-1 transmits the global snoop request to the global snoop control unit 44-1 (see t4), and broadcasts the global snoop request to the system controller 40-2 of the other node (see t5).
When receiving the global snoop request from the broadcast control unit 43-1, the global snoop control unit 44-1 of the local node searches the cache TAG 46-1 in the local node for the target address information corresponding to the target data of the global snoop request (see t6). Meanwhile, when the global snoop control unit 44-2 of the other node receives the global snoop request from the broadcast control unit 43-1, the global snoop control unit 44-2 searches the cache TAG 46-2 in the other node for the target address information corresponding to the target data of the global snoop request (see t7). The global snoop control units 44-1 and 44-2 of the respective nodes exchange the results of the searches of the cache TAGs 46-1 and 46-2 with each other, and combine the results so as to merge the cache statuses. Based on the result of the cache status merging, the global snoop control unit 44-1 determines the final operation in response to the fetch request (see t8).
If the target data of the fetch request is not detected in any of the cache memories, and the global snoop control unit 44-1 determines that the primary data corresponding to the target data of the fetch request is to be read from the main memory 30 in the local node, the memory access issuing unit 45-1 issues a read request (indicated as “MS-RD-REQ” in FIG. 13) with respect to the fetch request, to the main memory 30 in the local node (see t9). The primary data corresponding to the fetch request is then read from the main memory 30 in the local node into the system controller 40-1 (indicated as “RD” and “MIDQ”; see t10 in FIG. 13). After that, the memory access issuing unit 45-1 transmits the primary data read from the main memory 30 in the local node as a fetch data response (indicated as “FCH-DATA” in FIG. 13) to the CPU 10 (see t11), and the execution of the fetch request is completed (see t12).
Next, an operation flow to be performed to access data cached in a cache memory of the local node in the conventional large-scale information processing apparatus 100 is described.
As illustrated in FIG. 14, a fetch request is first issued from the CPU 10 (see t1), and the request transmission/reception unit 41-1 receives the fetch request from the CPU 10 (see t2). The local snoop control unit 42-1 then searches the cache TAG 46-1 in the local node for the target address information of the fetch request (see t3).
If the result of the search conducted in response to the memory access request indicates a hit in the cache TAG 46-1 in the local node (indicated as “result=HIT” in FIG. 14), the local snoop control unit 42-1 determines the final operation in response to the fetch request, based on the search result (see t4). Accordingly, the local snoop control unit 42-1 omits the global snoop control operation.
If it becomes clear that the target data of the fetch request is present in the level-1 cache memory 11a in the CPU 11 in the local node, and the local snoop control unit 42-1 determines that the cache data corresponding to the target data of the fetch request is to be read from the level-1 cache memory 11a, the local snoop control unit 42-1 issues a read request (indicated as “CPBK-REQ” in FIG. 14) with respect to the fetch request, to the CPU 11 including the level-1 cache memory 11a (see t5). The cache data corresponding to the fetch request is then read from the level-1 cache memory 11a (the CPU 11) into the system controller 40-1 (indicated as “RD” and “MIDQ”; see t6 in FIG. 14). After that, the local snoop control unit 42-1 transmits the cache data read from the level-1 cache memory 11a as a fetch data response (indicated as “FCH-DATA” in FIG. 14) to the CPU 10 (see t7), and the execution of the fetch request is completed (see t8).
As described above, in the conventional large-scale information processing apparatus 100, the global snoop control operation is omitted, and an access is made to a main memory in the local node only in the following cases (1) to (6).
(1) Where the issued memory access request is a command fetch request, and the target data of the command fetch request is found as a shared type (a shared fetch request to simply fetch the target data from one of the cache memories provided in the information processing apparatus 100) in the cache TAG 46-1 in the local node.
(2) Where the issued memory access request is a command fetch request, and the target data of the command fetch request is found as an exclusive type (an exclusive-type fetch command to cause only one cache memory to store the target data among all the cache memories provided in the information processing apparatus 100) in the cache TAG 46-1 in the local node.
(3) Where the issued memory access request is a shared-type (load) fetch request, and the target data of the shared-type fetch request is found as a shared type in the cache TAG 46-1 in the local node.
(4) Where the issued memory access request is a shared-type fetch request, and the target data of the shared-type fetch request is found as an exclusive type in the cache TAG 46-1 in the local node.
(5) Where the issued memory access request is an exclusive-type (store) fetch request, and the target data of the exclusive-type fetch request is found as an exclusive type in the cache TAG 46-1 in the local node.
(6) Where the issued memory access request is a block store request, and the target data of the block store request is found as an exclusive type in the cache TAG 46-1 in the local node.
As described above, by the conventional technique, only when the target data of a memory access request is found in a local cache memory, the global snoop control operation over the system boards in the information processing apparatus 100 can be skipped, and a data transfer between the CPUs in the local node can be initiated.
However, only the total capacity of the cache memories provided in the local node is available in the above conventional technique.
Also, in a case where there is a miss in all the cache memories in the local node, the location of the latest data corresponding to the target data cannot be detected. Therefore, in such a case, it is necessary to perform the global snoop control operation over the system boards.
As a result, the access start success rate is not sufficiently high when the global snoop operation is skipped, and the performance of the apparatus might not be improved as desired.