The present invention relates to the evaluation and optimization of code, particularly to be used in a processor including a cache.
1. Field of the Invention
The present invention relates to the evaluation and optimisation of code, particularly to be used in a processor including a cache.
2. Description of the Related Art
In the field of computer systems, cache memories and their use are well known. However, a brief discussion follows in so far as is necessary to fully understand this invention.
Caches are high-cost, high-speed memories that provide an important performance optimization in processors. This is done by keeping copies of the contents of most commonly used locations of main memory near to the processor, namely in cache locations. As a result, accesses to the contents of these memory locations are much quicker.
The instruction cache is responsible for optimizing accesses to the program being executed. The cache will usually be smaller than the size of the program, meaning that the contents of the cache will need to change to ensure that the parts of the program currently being executed are in the cache.
In designing the instruction cache a trade-off between cost and performance has to be made. Two of the key parameters that can be changed are the cache's size and associativity. These both influence the resulting silicon area and maximum clock frequency of the cache.
The size of a cache is determined by a number of factors, but will depend primarily on area limitations and target applications of the design.
Determining the appropriate level of associativity of the cache can be harder.
For a direct-mapped cache, each block in main memory maps to a unique location (line) in the cache. That is a “block” in memory is a chunk of data corresponding in size to a cache location. If two blocks map to the same line then they cannot be in the cache at the same time and will continually replace each other. This case is referred to as a conflict.
For a set-associative cache, each block maps to a set of lines. The block can be stored in any of the lines in the set. Note that because the number of lines in the cache is constant, dividing the cache into sets moans that more blocks map to each set. In general, the cache will be more effective with a reasonable level of associativity because it can decide which lines it will replace and which lines will be kept.
However, there are at least two reasons why a direct-mapped cache may be chosen, namely higher potential clock frequency and smaller area than a set-associative cache of the same size.
The disadvantage of a direct-mapped instruction cache is that conflicting addresses can cause large performance loss. As an example consider a real graphics application in an MPEG decoder. The graphics application includes a number of different functions, and in particular a variable length decode (VLD) and an inverse discrete cosine transform (IDCT) function which are used extremely often and in fact often in sequence on each new data set. That is, it is almost sure that if one is used, the other will be used subsequently in a short space of time. If they were to map to the same lines in the cache then there would be a conflict each time execution moves from one function to the other.
The results of such conflicts are performance losses as the code would have to be loaded from memory every time it was needed, and an increase of bus traffic.
The most common way of ensuring that there are no performance critical conflicts is to use a set-associative cache. This reduces the chances of conflicts dramatically, as the number of conflicting blocks must be greater than the number of lines in the set for the same performance loss to occur.
Another way of reducing the impact of conflicts is to use a victim cache. This will normally be a small, fully associative cache that stores the last few entries that have been evicted from the main cache. This can be an effective way of coping with a small number of conflicts. However, the effectiveness will vary highly depending on the size of the victim cache and the application being run.
The disadvantage of both of these solutions is that they impose hardware constraints on the design. The set-associative cache requires more silicon area and will limit the processor's maximum clock frequency. Using a victim cache increases the silicon area.
It is an aim of the present invention to reduce or eliminate conflicts in a direct-mapped cache to allow advantage to be taken of the smaller area and higher clock frequencies characteristic of such caches.