There has hitherto been a technique of acquiring profile information about access to a cache memory during execution of a program.
For example, there is proposed a technique of acquiring profile data for each cache set of a cache memory. In this technique, a cache set number corresponding to an address of an array X is obtained, and it is determined whether or not the cache set number is a set number s in charge of a profile data acquisition process. In the case where the cache set number is the set number s in charge of the profile data acquisition process, it is determined whether or not tag information corresponding to the address of the array X is stored in a storage unit. In the case where the tag information corresponding to the address of the array X is stored in the storage unit, a hit variable is incremented by one. In the case where the tag information is not stored in the storage unit, a miss variable is incremented by one.
In a high-performance computing (HPC) application program or the like, a hot spot of the program tends to be limited. Thus, in the case where profile data is acquired in order to grasp the characteristics of the program, it is often sufficient to investigate only several loops (kernel loops). In general, the loops of an HPC application access a large amount of data, and therefore it is desirable to effectively utilize a cache memory of a central processing unit (CPU) in order to execute the loops at a high speed.
The cache memory stores data such as the values of variables and array elements to be accessed during execution of a program. Instructions constituting a program are themselves also data, and therefore are stored in the cache memory. In the case where an instruction to be executed is not present in the cache memory during execution of the program, the CPU is not able to continue execution of the program until the relevant instruction is acquired from a main memory. Thus, a cache miss of an instruction is a factor of a reduction in performance that is more serious than a cache miss of data. In particular, the same instruction is repeatedly executed in a loop program, and therefore it is desirable to effectively utilize the cache memory also for instructions. Thus, it is an important technique to acquire cache profile information of the instructions.
Related techniques are disclosed in, for example, Japanese Laid-open Patent Publication No. 2014-232369.
The currently available method of investigating the usage situation of a cache utilizes a register built in the CPU, and enables acquiring simple information such as a count of the number of cache misses. However, the method which utilizes the register built in the CPU does not allow acquiring detailed cache profile information in consideration of caching of the instructions. It is also conceivable to use a method of acquiring detailed profile information using a CPU simulator or a dedicated tool. In this case, however, it takes an execution time that is much longer than that for normal execution of the program. Since execution of a large-scale HPC application program generally takes a long time, it is not practical to utilize the actual device for a long time in order to acquire profile information on the usage situation of the cache memory.