The present invention relates to a cache memory system in a computer system, and more particularly to a cache memory system capable of restricting replacement of data highly likely to be used afterward from a cache.
In a computer system it is known that main memory references by a computer program have locality. By utilizing this characteristic, frequently accessed main memory data can be copied to a high-speed small-capacity memory, called a cache memory system (hereinafter referred to as a cache). Then an access to main memory is replaced by an access to the cache, thereby enhancing memory access speed. A cache is described in detail, for example, in xe2x80x9cComputer Architecture: A Quantitative Approach to its Design, Realization, and Evaluation,xe2x80x9d pages 403-28, translated by Shinji Tomita, Kazuaki Murakami, and Haruo Niimi, published by Nikkei Business Publications, Inc.
Data exchanged between a cache and a main memory is managed in appropriate data size units, individually referred to as blocks. The size of a block is called a block size. A cache stores a plurality of blocks. For example, when a cache has a capacity (a cache size) of 128K bytes, and a block size is 128 bytes, 1024 blocks are stored in the cache.
Data stored in a cache is held in a memory called a data array. To identify which block in a main memory has been stored in a cache, the address of the stored block is held in a memory called an address array. To determine whether data to be referenced by a processor is located in a cache, addresses held in the address array are compared with an address referenced by the instruction.
Cache configurations are classified into three systems depending on where blocks are placed in the cache: the direct map system in which the address of each block in a main memory uniquely decides its position in a cache; the full associative system in which each block in a main memory is placed in any arbitrary place in a cache; and the set associative system in which blocks in each area of a main memory are placed in each predetermined area in a cache. In the full associative system, to determine whether data is located in a cache, it is necessary to compare an address referenced by an instruction against the addresses of all blocks stored in the cache. This is not practical considering the required hardware. For this reason, the direct map system or the set associative system are generally used to map each block of a main memory.
A cache employing a set associative system has its memory area divided into blocks arranged in N rows and M columns. Each block stores data and its own address. Each row in the cache is called a set, while each column is called a way. In the set associative system, a block fetched from a main memory is stored in one of the ways in a set uniquely determined by the address of the block. When there is an invalid (empty) way in the set, the block is stored in it. If all the ways are valid, the contents of one of the ways in the set are replaced and returned to the main memory, and the new block fetched from the main memory is stored in the way.
For replacement of a block, the LRU (Least Recently Used) algorithm is generally used to determine a target way for the replacement. In the LRU algorithm, a way which stores data referenced least recently in the respective set is used as a target for the replacement. When data to be referenced exists in a cache, the memory access is performed at high speed because it is not necessary to access the main memory. When data to be referenced is not in a cache, however, execution of an instruction using the data is delayed until the data is fetched from the main memory. To reduce delay time due to a cache miss such as this, a prefetch method is conventionally used. In a prefetch operation, a prefetch instruction is executed before a load instruction is executed so that data to be used by the load instruction is fetched beforehand. This causes a cache hit at the time of execution of the load instruction.
Generally, analysis of memory access patterns of computer programs often indicates the following characteristics:
(1) An access to data in a certain address recurs within a relatively short time.
(2) Data accessed during a period of time is distributed over addresses relatively close to one another.
The former characteristic is called xe2x80x9ctemporal localityxe2x80x9d, while the latter is called xe2x80x9cspatial locality.xe2x80x9d Generally, data stored in a cache by a prefetch instruction exhibits spatial locality, but not temporal locality. Scalar data such as stack data does not show spatial locality, but shows temporal locality.
In a cache employing a set associative system in which the LRU method is used as a replacement algorithm, when a large array having spatial locality, but not temporal locality is accessed, data having temporal locality such as data in a stack is replaced from the cache, overwriting all data within the cache with the above array. A technique for solving the problem that a block having temporal locality is replaced from the cache by a block having spatial locality, but not temporal locality, as described above, is disclosed, for example, in Japanese Laid-Open Patent Publication No. 7-281957 (1995). According to this technique, when data likely to be used again is first referenced, the LRU function is locked, and the lock is released when the data is used lastly.
In the above LRU lock method, however, the LRU function may not be activated after a process is switched to another process, or cache usage may be reduced. Consider, for example, that the LRU function is locked when a stack is first referenced in a process A,. and then the process A is switched to a process B before the lock is released. In this case, even though the process A has been switched to the process B, the LRU function remains locked. Therefore, the block which has been designated as a replacement target when the LRU function was locked is still a replacement target in the locked column despite switching of the processes. This may cause the locked column of the process B to operate as if the cache were of a direct map type, resulting in a great reduction in cache usage. Thus, the above conventional technique using the LRU function may degrade performance. Use of the LRU lock method disclosed in Japanese Laid-Open Patent Publication No. 7-281957 (1995) may lead to a reduction in cache usage in the multiprocess environment.
The present invention provides a cache memory system capable of limiting occurrence of replacement of data having temporal locality due to reference to data having spatial locality, but not temporal locality. In addition it is capable of properly performing the LRU function in a multiprocess environment without employing a special process.
To achieve the above, the present invention provides a cache memory system employing a set associative system with a plurality of ways which can store data having a same set address. Preferably, the cache memory system includes, when a cache miss occurs, inputting a mode signal which instructs limiting of replace ways to be used for storing a block containing data to be accessed, and a replace way determining circuit for, when replace ways are limited by use of the mode signal, determining a replace way from among a plurality of ways. In a preferred mode, according to the present invention, the replace ways are limited when an instruction to be executed is a prefetch instruction to preread data. Furthermore, a replace way is determined based on the number of prefetch instructions in execution which access a same set.
As another aspect, the present invention provides a processor having a cache memory system employing a set associative system in which the LRU method is used as a replace algorithm for a cache block. The cache memory system has a circuit for limiting ways in which data fetched by use of a prefetch instruction is stored. The cache memory system changes its method for determining a replace way depending on the number of prefetch instructions in execution which access a same set.
Thus, by limiting ways which store prefetch data having low temporal locality, it is possible to make it difficult for data having high temporal locality, such as scalar data, to be replaced from a cache by the prefetch data, even when a large amount of data is fetched in the cache by use of a prefetch instruction.