It is known that the following is important: in a computer system that is provided with a processor, a cache memory (which may be simply referred to as “cache,” hereinafter), and a main memory (which may be simply referred to as “memory,” hereinafter), the cache conceals a delay time (referred to as “latency,” hereinafter) caused by the slow operating speed of the memory to improve the performance of application software.
The cache is a high-speed, small-capacity memory and able to store a portion of data on the memory. If data exist (hit) on the cache when a CPU (Central Processing Unit) accesses the memory, the data is supplied to the CPU with low latency. If data do not exist (miss) on the cache, the cache acquires data from the memory and supplies the data to the CPU. As described above, the operating speed of the memory is slower than the operating speed of the CPU or cache. Therefore, the latency needed for the data to be supplied is larger. Thus, the CPU stalls for a longer period of time, resulting in a drop in the performance of applications.
Because of such nature of the cache, it is known that improving the hit rate of the cache is important. Therefore, various methods are used, including a method of altering a program to improve the hit rate and a method of using a Way lock system that does not allow data stored in the cache to be driven out of the cache.
For example, what is disclosed in PTL 1 as a related technique is a method of using page control of an OS (Operating System) to reduce the number of times the data on the cache is driven out between processes in order to improve the cache hit rate. The related technique will be described with reference to FIG. 16.
FIG. 16 shows an example of the configuration of a system including a CPU, cache and memory. In the present example, the system includes a CPU 10, a cache memory 20 and a main memory 30. The CPU 10 is connected to the cache memory 20, and the cache memory 20 to the main memory 30. The cache memory 20 includes a cache controller 21 and data memory/tag memory 22. The data memory 22 of the cache memory 20 is accessed through the cache controller 21.
When the method disclosed in the above PTL 1 is applied to the present system, it is possible to keep data in the cache from being driven out of the data memory, resulting in an increase in the cache hit rate.
As described above, the data memory 22 of the cache 20 is accessed through the cache controller 21. Therefore, it is important that the cache controller 21 should not stall in order to improve the performance of applications.
When the cache controller 21 stalls, the cache controller 21 does not accept a request for a new memory access from the CPU 10. As a result, the latency for a subsequent memory access increases until the stalling is brought to an end. In that manner, when the cache controller 21 stalls, it becomes impossible to read data therefrom even if there are data on the data memory 22. Thus, the problem is that the advantage of the method of improving the cache hit rate will be lost.
One system of cache known as blocking cache is a system in which the cache controller stalls when data is being acquired from the memory because a cache miss occurs. With the blocking cache, therefore, the problem is that when a plurality of memory accesses occurs, the latency for a subsequent memory access increases, resulting in an increase in the stalling time of the CPU.
FIG. 17 shows an example of a cache shared by two CPUs. In the present example, a first CPU 11 and a second CPU 12 are connected to a blocking cache 23, and the blocking cache 23 to a main memory 30.
In the present example, suppose that the memory access from the first CPU 11 first occurs and a cache miss then occurs. The memory access from the second CPU 12, which occurs during a process of the cache miss, needs to wait until the preceding process of the cache miss comes to an end. Due to the delay of the memory access, the second CPU 12 stalls longer in time.
What is illustrated in the above example is a cache shared by a plurality of CPUs. However, a similar problem could occur even when a cache is accessed from a single CPU. For example, a similar problem could occur in such cases as where a process of simultaneously executing a plurality of threads is supported by a CPU and a plurality of memory accesses occur from a single CPU.
To solve such a blocking cache problem, a non-blocking cache is disclosed in NPL 1. FIG. 18 shows an example of a cache shared by two CPUs. In the present example, a first CPU 11 and a second CPU 12 are connected to a main memory 30 through a non-blocking cache 24. As shown in the diagram, the non-blocking cache 24 includes a register called MSHR (Miss Status/Information Holding Register). Information required to process a cache miss is stored in the MSHR in advance, enabling a subsequent memory access to be processed during a process of the cache miss. Therefore, compared with the use of the above blocking cache, the use of the non-blocking cache makes it possible to reduce the stalling time of the CPU when a plurality of memory accesses occurs at the same time.