The present invention generally relates to a cache, and more particularly to a cache control system and method suitable for preferentially maintaining data frequently used by a program to be loaded in a cache.
A cache is a memory having a small capacity and a high access speed for compensating for a gap between a high speed CPU and a low access speed main memory (hereinafter simply called a memory).
The relationship between a CPU, a cache, and a memory is briefly illustrated in FIG. 2. In referring to the memory contents during the execution of a program, CPU accesses the data if it is present in the cache, and if not, CPU accesses the memory. The cache is generally placed very near to CPU and operates at a high speed. Therefore, if most of data required by CPU is present in the cache, very high speed data processing can be expected.
It is necessary for the effective cache operation to hold data at a frequently accessed address in the cache. Most of the patterns of memory accesses by a general program often has the following features.
1. Data once accessed at a certain address is accessed again after a relatively short time.
2. Data accessed in a predetermined time period distributes at relatively near addresses.
The former feature is called "time locality", and the latter is called "space locality".
The former time locality signifies that "an address once accessed will be accessed again in the near future". Therefore, if data once accessed is held in the cache, the access time to the data when the address is again referred, can be shortened because the data is in the cache. From this reason, if data referred to by CPU is not in the cache, the data is necessarily stored in the cache to prepare for the next reference to the data.
A cache also uses the latter space locality. The space locality signifies that "if an address is accessed once, addresses near the accessed address are accessed in the near future". Therefore, if new data is read from a memory, not only this data but also other data at near addresses are stored in the cache. It occurs often that when a near address is accessed in the future, an access time to the data is shortened because the data is present in the cache.
A cache is divided into lines of a predetermined length (several tens bytes to several hundreds bytes). Data transfer between a cache and a memory is performed in the unit of one line. For example, when one data is read from a memory, nearby data corresponding to one line is held in the cache.
A cache is effective for shortening a memory access time. However, it may become wasteful if a program without locality is used. Consider for example the following program. In the program, "float" means a declaration of floating point data, and i ++ means i added with +1.
______________________________________ float a[1000]; for (i = 0; i &lt; 100: i + +) { . . . a[i * 10] . . . } ______________________________________
References to the array a will be explained. The elements of the array a referred to by the program are a[0], a[10], a[20], . . . , a[999]. Each time a different element is referred, and so there is no time locality.
In the case of space locality, if one line of a cache has 32 bytes and an array element is 4 bytes, one line can store only eight array elements. In the exemplary program accessing every tenth data, the data in one line excepting first accessed is not used at all. There is therefore no space locality (assuming that the size of space is 32 bytes).
In order to avoid such wasteful loads to a cache, a processor has been provided which has a function (instruction) called a cache bypass. This function allows a memory access instruction to read data directly from the memory without loading the data to the cache even if it is not present in the cache.
If it is known in advance that data load to a cache is wasteful as in the case of a reference to the array a by the exemplary program, a cache bypass instruction is used so as not to load an element of the array a in the cache.
A processor having such a function is described, for example, in "On the Floating Point Performance of the i860 Microprocessor", by K. Lee, International Journal of High Speed Computing, Vol. 4, No. 4 (1992), pp. 251-267.
An explanation of a general cache technique is disclosed, for example, in "Information Processing", Vol. 33, No. 11 (1992), pp. 1348-1357.
With the above-described conventional techniques, however, even a program having both space locality and time locality cannot effectively use a cache in some cases. These cases will be explained in connection with the following program.
______________________________________ float a[100], b[1000] [100]; for (i = 0; i &lt; 1000; i + +) { for (j = 0; j &lt; 100: j + +) { . . . a[j] . . . . . . b[i] [j] . . . } } ______________________________________
This program includes a double-loop and refers to two arrays a and b. The array a is a one-dimensional array having a size of 100*4=400 bytes, and the array b is a two-dimensional array having a size of 1000 *100*4=400 Kbyte. Assuming that the size of a cache is 32 Kbyte, it is impossible to load all these data in the cache.
In the case of the array a, 100 elements including a[0], a[1], a[2], . . . , a[99] are referred in the inner loop. These references are repeated 1000 times in the outer loop, by referring to the same element 1000 times. As a result, the reference to the array a has a high possibility of having both time locality and space locality (because consecutive addresses are sequentially accessed).
Since the size of the array a is smaller than the cache size, all the elements of the array a can be loaded in the cache. As a result, once an array element is loaded in the cache, the remaining 999 references to the element are all ideally a cache hit (data is present in the cache).
In the case of the array b, 1000*100=100000 elements including b[0][0], b[0] [1], . . . , b[0] [99], b[1] [0], b[1] [1], . . . , b[1] [99], b[2] [0], b[2] [1], . . . , b[999] [99] are sequentially referred in the both loops. In this case, although the same element is not referred twice and so there is no time locality, the cache can operate effectively because there is space locality.
Specifically, assuming that one line of the cache can load eight array elements, when one element (e.g., b[0] [0]) is referred, the succeeding seven elements (b[0] [1], . . . , b[0] [7] are also loaded in the cache so that the following seven references are a cache hit.
Since the size of the array b is larger than the cache size, all the elements of the array b cannot be loaded in the cache. As a result, the old data in the cache becomes necessary to be replaced by new data.
In such a case, there occurs an issue of which old data is to be replaced. In the case of the above exemplary program, it is better to replace the data in the array b than to replace the data in the array a, because each element of the array a is referred a number of times, the data in the array b is referred only once.
Data in a conventional cache has been replaced in accordance with a random replacement rule or a least recently used (LRU) replacement rule. With a random replacement rule, a replacement entry is randomly selected in the literal sense, considering the data use state not at all.
With the LRU replacement rule, data in a cache accessed earliest is replaced. In other words, whether data in the cache is replaced or not is determined from the past data reference state.
A cache memory controller is disclosed in Japanese Patent Laid-open (kokai) Publication JP-A-3-58252 filed on Jul. 27, 1989. This cache memory controller provides a cache with a load priority address table. The table stores load priority bits indicating whether each address area in the cache is to preferentially store data or not. In accordance with the contents of the load priority address table, the order of replacing cache lines is determined.
In any one of the above-described conventional techniques, data in a cache to be replaced is selected either randomly or by mechanically applying a particular local rule independently from a program. It is not intended therefore to preferentially hold desired data in a cache if the data is supposed, from the contents of a program, to be used in the near future. As a result, in the case of the above-described program, it is not ensured that data of the array b of the cache is replaced and data of the array a is maintained.
The conventional techniques are therefore unable to replace particular data in a cache (e.g., data supposed from a program to be replaced in the near future).
Furthermore, it is unable for a programmer or compiler to instruct not to replace particular data in a cache having a high possibility of reuse.
As described above, a conventional cache is not effectively used (less cache hit) in some cases, depending upon the contents of a program.