1. Field of the Invention
The present invention relates to a cache memory analyzing method for detecting an address where a cache miss is generated due to memory access contention in a processor having a cache memory loaded thereon or a simulated information processing apparatus and the like that simulates the processor.
2. Description of the Related Art
In a system such as a processor or a simulated information processing apparatus (simulator), latency generated due to a cache miss in requesting data is one of the most serious bottlenecks. Thus, in order to improve the execution performance in an embedded system that works on the processor and the information processing apparatus, it is important how the cache miss is reduced. For effective reduction of the cache miss, it is necessary in the above-described system to specify the memory access data that exhibits a high efficiency for reducing the cache miss.
Conventionally, there is known a method (1) for specifying the memory access data having a high efficiency for reducing the cache miss. It is a method where the numbers of cache misses generated in addresses of each memory access are calculated, and the address with the largest number of cache misses is detected as the memory access data with the high efficiency for reducing the cache miss.
FIG. 11 shows an example of the detection method (1). FIG. 11 illustrates the access state of the cache memory. The cache memory whose access state is shown in FIG. 11 is of a set associative system having two ways. In FIG. 11, the horizontal axis is a lapse of access time of the cache memory and the vertical axis indicates data at the addresses in the same set. The “set” indicates an index (entry) that is distinguished form other memory areas by the several lower bits of the memory-access addresses, and the “way” indicates how many block (cache block) there is for storing the data within the cache memory. In FIG. 11, five data from a to e are accessed in the set 0, and three data from α to γ are accessed in the set 1. Since the address with the largest number of cache misses is considered as the memory access data having the high efficiency for reducing the cache miss, the data a in the set 0 having the cache misses generated six times is detected in FIG. 11 as the memory access data with the high efficiency for reducing the cache miss.
Further, U.S. Pat. No. 5,930,507 discloses a method (2) for specifying the memory access data that may have contention. The method collects the memory access data on the cache memory in a compile processing apparatus that works on a calculator having the cache memory, and analyzes the relation of contention generated between the data.
In the method (1), the address with a large number of cache misses is calculated about the addresses of each memory access where the cache miss has been generated, without considering the access contention of the cache memory. Thus, the address with the high efficiency for reducing the cache miss may not be detected. “Contention” means repetition of the state where one of the different addresses in the same set of the cache memory boots off the other from the cache memory or the one is booted out by the other from the cache memory.
The aforementioned inconveniences will be described referring to FIG. 12A and FIG. 12B. These figures show the access state of the cache memory as FIG. 11. FIG. 12A shows the state where the cache misses are reduced in the data “a” of the set 0 in FIG. 11 by changing the address arrangement or the like. FIG. 12B shows the state where the cache misses are reduced in the data a of the set 1 in FIG. 11 by changing the address arrangement or the like.
As shown in FIG. 12A, by changing the address arrangement of the data “a” with the largest number (six) of cache misses, the total number of cache misses can be reduced from 30 times to 22 times.
As shown in FIG. 12B, however, the total number of cache misses can be reduced from 30 times to 15 times by changing the address arrangement of the data α with five times of the cache misses. This is because the contention of the memory access in the set 1 is eliminated by changing the address arrangement of the data “α” and the number of cache miss in the set 1 becomes zero as access contention of the memory is generated among the three data “α”, “β”, “γ” in the set 1. In this case, it can be considered that the data “α” with the five time of cache misses has a higher efficiency for reducing the cache miss than the data a with six-time cache misses. Like this, there are cases where the efficiency for reducing the cache miss is high even though the number of generated cache misses is small.
The method (2) discloses a method to analyze access contention of the cache memory only for an arrangement element where the address in loop processing at the time of compilation can be recognized statically by a compiler. However, since the access contention of the cache memory is statically analyzed at the time of compilation in this method, the access state of the cache memory in actual operation is unknown.