1. Field of the Invention
The present invention relates generally to memory systems for high-performance computer systems and, more particularly, relates to memory systems for high-performance computer systems having a mechanism to make a hardware prefetch in accordance with the patterns of memory-access addresses in data fetch wherein a cache miss has occurred to thereby improve the memory-access performance.
2. Description of the Background
Because the processing performance of computer system memories has been progressing slower than that of the processors, the performance gap between memories and processors has been increasing every year. Therefore, a cache memory is built in most processors to partially make up for this gap. However, because the cache memory makes use of the temporal and spatial locality of data, the cache memory often fails to work effectively in memory-access patterns without locality, which may significantly reduce the performance of the processor. This phenomenon is often observed in large-scale scientific and technical computing wherein access tends to be made in sequence to arrayed data with little data reused.
To address this problem, the prefetch instruction has been used for software to transfer data in advance from a memory to a cache memory. In the case that list access is made to a data array, or in the case of a program described in an object-oriented language, software often fails to insert the prefetch instruction even if the memory-access pattern is sequential.
On the other hand, methods invented for prefetch with hardware include: (i) methods of making a hardware prefetch of a data stream which has already been prefetched once and (ii) methods of making a hardware prefetch if the difference between the address of the past memory access and the present memory access falls into a prescribed range. One of the former methods (i) is disclosed in U.S. Pat. No. 5,345,560; one of the latter methods (ii) is disclosed in U.S. Pat. No. 6,173,392.
In the case of the former methods, however, hardware prefetch can be made only to data streams which have already been prefetched once, and the hardware prefetch is therefore ineffective for data streams which have yet to be prefetched. In the case of the latter methods, although the address of data to be prefetched is generated by adding the interval of said address to the present access address, this hardware prefetch often fails to eliminate the latency in data transfer from the main memory to the cache memory.
Because instructions are scheduled in a processor with a built-in cache memory based on an assumption that the latency of the cache memory is short, processing performance falls significantly if a cache miss occurs. Such a cache miss often occurs in sequential memory-access patterns.
Accordingly, the present invention preferably provides a system and method to shorten the memory-access latency, even if data to be prefetched are in sequential addresses, which may thereby lessen the adverse effects of cache misses on performance.
A computer system according to one aspect of the present invention is characterized by: (i) a request-generating mechanism which stores the history of memory-access addresses in data fetch in which a hardware cache miss occurred, generates with hardware a demand to fetch data to an address advanced by a prefetch interval set by software, and fetches data from the main memory before further cache-miss data fetch takes place; and (ii) a buffer which stores the data transferred from the main memory in accordance with the requests issued by the request-generating mechanism. With these features, the data-transfer latency in cache-miss data fetch may be reduced.
In accordance with the invention described above, regarding a cache-miss load instruction, when the data to be transferred are in sequential addresses, such data transfer can be accelerated. Specifically, in the present invention, the history of the transfer request address of the cache-miss load instruction is registered so that a cache-miss load regarding consecutive addresses is detected and a request for hardware prefetch to a successive address can be issued. Also, since a prefetch interval register can be set with software, it is possible to let the transfer timing of prefetch data and the timing of data utilization coincide. In the present invention, data transferred from a memory system by the prefetch request that hardware issues may be stored in a dedicated prefetch buffer. Accordingly, data in a processor""s cache memory is not expelled, and the data-transfer latency of the cache-miss load instruction may be shortened.
Other features, objects, and/or advantages of the invention will appear more fully from the following detailed description, figures, and attached claims.