1. Technical Field
The present invention relates generally to computer data caches and, more particularly, to data prefetching mechanisms.
2. Description of Related Art
A great amount of invention and research work has gone into improving the hit ratios of the instruction caches by prefetching or predicting the instruction references. However, similar level of improvements to data caches have remained an elusive goal. Instruction prefetching, as contrasted with data prefetching, has been relatively easy because program execution lends itself nicely towards prefetching due to the inherent high level of spatial locality. Furthermore, temporal locality can also be tapped easily by utilizing the branch target behavior.
While data references also exhibit temporal and spatial locality, the locality of reference s, unlike that of instruction references, is not dependent on the execution of the branch instructions, but more on the data addressed that are dynamically generated during the execution of the program. The lack of direct improvements to the hit ratio of the data caches has been somewhat made up for by other techniques, such as, for example, lock-up free caches, decoupled access architectures, early prediction of effective address of memory accesses, complier directed prefetching and load unit prefetching. However, the overall performance improvement is less because of the increased cycle time resulting from the implementation complexity of processor pipelines and./or from the extra instructions that must be executed to do the software based prefetching Furthermore, each of these approaches have other fundamental limitations.
For example, lock up free caches allow out-of-order instructions to continue execution in the presence of multiple outstanding cache misses, but do not help much if the code is highly interlocked. that is, if the instructions are sequentially dependent on the results of the previous instructions, such as, for example, in integer and commercial workload, there is not much benefit after the first outstanding miss. this is similarly true of decoupled access architectures. Complier directed prefetching on the other hand suffers from the inability to handle dynamic run time behavior. The benefit is even less if this slows down the processor""s clock cycle time.
While all of the above techniques de help reduce the penalty from cache misses, a more direct solution for improving the performance of a data cache the reduces the effective miss rate is desirable.
The present invention provides a data structure to aid in and a method, system, and computer program product for prefetching data from a data cache. In one embodiment, the data structure includes a prediction history field, a next line address field, and a data field. The prediction history field provides information about the success of past data cache address predictions. The next line address field provides information about the predicted next data cache lines to be accessed. The data field provides data to be used by the processor. When a data line in the data cache is accessed by the processor, determines the value of a prediction history field and the value of a next line address field. If the prediction history field is true, then the next line address in the next line address field is prefetched. Based on whether the next line actually utilized by the processor matches the next line address in the next line address field, the contents of the prediction history field and the next line address filed are modified.