In operations of an information processing device including a plurality of central processing unit (CPU) cores, when the plurality of CPU cores refer to the same memory area, snooping occurs. This causes a decrease in processing performance. The term “CPU core” as used here refers to a CPU itself when one core is included in the CPU, and refers to each core when a plurality of cores are included in a CPU. The term “snooping” is a process for maintaining the coherence among cache memories when a plurality of CPU cores access the same memory, and is referred to also as cache coherence.
FIG. 10 is a diagram for explaining snooping. FIG. 10 illustrates the case where four CPUs (CPU #0 to CPU #3) each including 10 CPU cores denoted by core #0 to core #9 are coupled via an interconnect 7. It is assumed that each CPU core accesses the same area 9 in a memory 3, and the area 9 is cached by core #0 and core #1 of CPU #0 and core #1 of CPU #1.
Once core #0 of CPU #0 writes data to the area 9 in the cache memory 8, core #0 of CPU #0 notifies other CPU cores of an address of the area 9 at which the data is written (1). Then, the other CPU cores, having received the notification, invalidate data if the address of which the other CPU cores have been notified is cached, and core #0 of CPU #0 updates the area 9 of the memory 3 with rewritten data (2). Then, when data of the area 9 has to be used, the other CPU cores read data of the area 9 from the memory 3 (3).
A sequence of processes (1) to (3) performed in such a way in order to maintain coherence among cache memories 8 as data is written to the cache memory 8 is referred to as snooping.
FIG. 11 is a diagram illustrating a processing instance in which snooping occurs. In FIG. 11, a thread A run on core #0, a thread B run on core #1, a thread C run on core #2, and the like use management area [0] to management area [1023] in a distributed manner. Here, a thread is a unit of processing that uses a CPU core, and is the smallest unit of execution of a program on an operating system (OS) that supports parallel processing. A counter 33 is an index for using management area [0] to management area [1023] in a distributed manner and is initialized to “0”.
In the example illustrated in FIG. 11, first, the thread A refers to the counter 33 (1) and, according to the value of the counter 33, the thread A is associated with a management area that the thread A uses (2). Here, the value of the counter 33 is “0”, and therefore the thread A is associated with management area [0].
Next, the thread A updates the counter 33 from [0] to [1] for the sake of the next thread B (3). Then, the thread B refers to the counter 33 (4) and, based on the value of the counter 33, the thread B is associated with a management area that the thread B uses.
In this example, references and updates are made from different CPU cores on which different threads run, to the counter 33, which is the same memory area, and thus snooping occurs. This results in a delay in reference processing denoted by (4).
Therefore, the same memory areas to which writing is made from different CPU cores are prepared such that the number of memory areas corresponds to the number of CPU cores. Furthermore, threads are divided into groups, so that access to the same memory area is dispersed. Thus, snooping is reduced. FIG. 12 is a diagram illustrating an existing example of dealing with snooping. FIG. 12 illustrates the case where there are two CPU cores, and threads are divided into two groups, a group #1 and a group #2. An operating system (OS) dispatches a thread to an available CPU core so as to run the thread. Consequently, in the example of FIG. 12, the CPU on which a thread that manipulates the counter runs is variable.
In FIG. 12, a thread included in group #1 accesses a counter A and uses management area [0] to management area [511] for group #1. A thread included in group #2 accesses a counter B and uses management area [0] to management area [511] for group #2. In this way, threads are divided into two groups so as to reduce by half the number of accesses made to the same memory area. This may reduce snooping.
Note that there is a related-art technique in which, when threads have access to the same data, it is determined that output data of a thread X is input data of a thread Y, and data is written to a cache of a CPU on which the thread Y runs, so that snooping is inhibited. There is another related-art technique in which the amount of bus traffic of CPUs is monitored, and movement of a process between processors is suppressed when the amount of bus traffic exceeds a threshold, so that snooping is inhibited.
Examples of the related art techniques include International Publication Pamphlet No. WO 2011/161774 and Japanese Laid-open Patent Publication No. 6-259395.
In a related-art technique illustrated in FIG. 12, in case #1 where the thread A included in group #1 is run on core #0, and the thread B included in group #2 is run on core #1, no snooping occurs. In contrast, in case #2 where a thread D included in group #2 is run on core #0, and the thread B included in the same group #2 as the thread D is run on core #1, the two threads have access to the counter B, thereby causing snooping.
Since the OS dispatches threads to vacant CPU cores so as to run the thread, both case #1 and case #2 may occur. Accordingly, with the related-art technique illustrated in FIG. 12, there are some cases in which, even though the number of occurrences of snooping may be reduced, it is impossible to inhibit snooping.