1. Field of the Invention
This invention concerns a method of compiling programs which are executed on computers having cache memory, and more specifically, such a method which generates code that reduces, by as much as possible, cache conflicts which would otherwise occur due to conflicting cache access during execution of these programs.
2. Description of the Prior Art
Cache memory and cache conflict are described in works such as J. Hennessy and D. Patterson, Computer Architecture: A Quantitative Approach, (.COPYRGT.1990 Morgan Kaufmann Publishers, Inc., Palo Alto, Calif.), pages 408-425.
Cache memory (referred to hereafter simply as "cache") is a type of rapidly accessible memory which is used between a processing device and main memory. By placing a copy of one portion of data residing in main memory into cache, data referencing speed, i.e., the speed at which this data is accessed, can be increased. The units of data transferred between cache and main memory are called "blocks"; blocks in cache are called "cache blocks", while blocks in main memory are called "memory blocks".
As to methods of implementing caches, there are three mapping methods which place copies of memory blocks into cache blocks: the full associative method, the set associative method, and the direct map method. Recently, to provide caches with enhanced capacity and increased speed, the direct map method is coming into mainstream use.
With the direct map method, the cache blocks, into which memory blocks are mapped, are uniquely determined. Since cache capacity is generally smaller than main memory capacity, multiple memory blocks are mapped into one cache block. Because of this, even if a piece of data has been transferred to a cache block, if the data that was previously written into that cache block is now rejected from the cache block due to a reference of another memory block now mapped in the same cache block, a cache miss will be generated with the next reference.
This phenomenon is called "cache conflict"; cache misses generated by this phenomenon are called "cache misses caused by cache conflict". One drawback of the direct map method is that with certain programs, a substantial amount of cache conflict is generated, causing a marked drop in performance.
FIGS. 2 through 7 show cache conflict in detail. Hereafter, and for purposes of illustration and simplicity, I will assume the use of the direct map method with a block length of 16 bytes and a cache capacity of 256Kbytes (256.times.1024 bytes).
FIG. 2 shows a source program; FIG. 3 shows an overview of the status of mapping from main memory to cache; FIGS. 4 and 5 show, in increased detail over that in FIG. 3, the status of mapping from main memory to cache; FIG. 6 shows an example of an object program for the source program of FIG. 2; and FIG. 7 shows the generation of a cache conflict.
First, FIG. 2 shows an illustration program, here a source program, in which multiple cache conflicts are generated. In this source program 3, with a common declaration, arrays A, B, and C are allocated consecutively in the memory area in this order. Arrays A and C are two dimensional arrays; array B is a three dimensional array. Each array is a real type (floating point numbers) with element length of 4 bytes. The declared size for array A is (256, 128), for array B is (256, 128, 3), and array C is (256, 128). The execution portion of program 3 is a doubly nested loop. For the innermost loop, for I=1 to 255, the C(I,J) value is calculated using the values A(I,J), B(I,J,2), A(I+1,J), B(I,J,3), and B(I,J,1).
FIG. 3 shows a part of each array residing within main memory 120 and its corresponding part of cache 110, and the correspondence relationship itself. The size of each area in the memory that contains the parts of arrays, specifically A(1:256,1:128), B(1:256,1:128,1), B(1:256,1:128,2), B(1:256,1:128,3), and C(1:256,1:128), is 128Kbytes (=256*128*4). Accordingly, arrays A(1:256,1:128), B(1:256,1:128,2), and C(1:256,1:128) are mapped in one common area in cache 110, while arrays B(1:256,1:128,1) and B(1:256,1:128,3) are also mapped in another common area in cache 110. The above mentioned expressions "m:n" show the range of subscripts from lower bound m to upper bound n.
FIGS. 4 and 5 collectively show, in detail, the structure of cache 110 as well as its mapping with main memory 120.
As shown, cache 110 is a collection of unitized 16 byte cache blocks, illustratively cache blocks 111 to 116. Main memory 120 is also shown with unitized memory blocks. Mapping from memory block 120 to cache 110 is shown with arrows.
For example, in FIG. 4, three memory blocks A(1:4,J), B(1:4,J,2), and C(1:4,J) are mapped into one common cache block 111. However, only one of these three memory blocks can reside in cache block 111 at any one time. Because of this, cache conflict can arise where the data is referenced in these memory blocks through the cache. Similarly, cache conflict also occurs in cache blocks 112 and 113.
Since, as shown in FIG. 5, two memory blocks belonging to the same array (B) are mapped into each one of three common cache blocks, cache conflict can be generated here as well. Consequently, as shown in this figure, cache conflict is also generated for accesses within the same array (B).
FIG. 6 is an example of an object program that corresponds to source program 3 shown in FIG. 2. However, FIG. 6 only shows the object code for the innermost loop in FIG. 2. Since generally during execution of the program, the greater portion of the processing time is spent executing the innermost loop, the remaining portions, at least for this discussion, are not very important and, for simplicity, will be ignored.
In FIG. 6, each instruction is identified by a unique instruction number 61 (specifically numerals 1--13) added to the left side of the figure. The label 62 is used as a branch target for branch instructions. Processing contents 65 of each instruction 63 for operand 64 are shown all the way to the right in this figure. Of these instructions, instructions 1, 2, 4, 5, 8 and 10 are memory reference instructions, inasmuch as they reference array elements A(I,J), B(I,J,2), A(I+1,J), B(I,J,3), B(I,J,1), and C(I,J) respectively.
In FIG. 7, the cache conflicts caused by memory reference instructions in the object program, shown in FIG. 6, are depicted in the order in which these instructions are executed. Each line shows, from the left: a value 70 (I value) for innermost loop control variable I; instruction number 71 with each different instruction carrying its specific instruction number from FIG. 6; instruction referenced array element 72; cache hit status 73 (shows whether or not referenced data exists in cache); accepted block 74, i.e., the memory block which was transferred from main memory 120 and placed in cache 110; rejected block 75, i.e., the memory block which was rejected from cache 110, and cache miss 76 which shows whether this miss was caused by a cache conflict.
As can be seen from FIG. 7, cache misses occur with all of the memory references, with most of these cache misses caused by cache conflict. For example, instructions 1 and 4 for I=1 both reference the same memory block A(1:4,J), but because memory block B(1,J,2) which is now mapped in the same cache block as was block A(1:4,J) previously referenced by instruction 2, block A(1:4,J) is rejected from cache. This causes a cache miss to occur with instruction 4.
Of the cache misses shown in FIG. 7, the cache misses generated by referencing the same memory block twice are all cache misses caused by cache conflict. Of the 24 cache misses from I=1 to I=4, 18 of them are cache misses caused by cache conflict. Thus, it is clear that cache conflict is a major reason for cache misses.
One method of reducing cache conflict is using the full associative method or the set associative method as shown in the above prior art. However, with either of these methods, the implementing caching hardware becomes complex and cache referencing speed is reduced. Furthermore, with each of these methods, it is difficult to significantly increase the capacity of the cache.
Moreover, with set associative method based cache, when the number of cache block candidates (associativity) to which the memory blocks are mapped is small, it is not possible to sufficiently avoid cache conflict. Consequently, for this type of cache, even when the associativity is small, use of a cache conflict reduction method is still quite necessary.
With the direct map method, cache conflict may be reduced by simply increasing the capacity of cache memory. However, when compared with the cost of main memory of the same capacity, cache memory costs significantly more than main memory, hence limits exist to increasing cache capacity.
As shown above, because the structure of direct map method cache is simple, this method has the advantages of providing enhanced access speed and ease of expanding cache capacity. However, with certain programs, disadvantageously cache conflict occurs and performance is greatly lessened. Until the present invention, as described below, a sufficient way did not exist to reduce cache conflict in direct map method cache.
Even with set associative method based cache which generates relatively little cache conflict, nevertheless when associativity is small, cache conflict occurs and performance is greatly lessened. If the associativity is sufficiently increased, all of the cache conflict can be avoided, but at the cost of complicating the ensuing caching hardware structure, lowering cache referencing speed and making any significant increase in cache capacity quite difficult. Based on these reasons, the associativity of the set associative method based cache currently available on the commercial market is approximately 2 to 4, which, for programs that might generate an appreciable amount of cache conflict, is inadequate, in practice, to sufficiently prevent the cache conflict from occurring.