The present invention relates to a data prefetch method and, more particularly, to a compiling method for realizing high-speed indirect array reference by effectively using prefetching for a processor having a prefetch instruction.
The performance of a microprocessor is being dramatically improved with improvements in parallel processing at the instruction level and in frequency. In contrast, improvements in performance of a DRAM as a component of a main memory of a computer remains at a low level as compared with the improvements in performance of a processor. Consequently, there is a tendency that the number of cycles required to refer to a main memory is increasing.
As shown in FIG. 2, in many microprocessors, as a method of hiding the time required to refer to the main memory, a memory of a small capacity called a cache which can be referred to at higher speed as compared with the main memory is disposed between a processor and a main memory and data referred to recently is stored in the cache, thereby shortening the time required to refer to the main memory. By the method alone, however, a waiting time occurs in the case where data which is not stored in the cache is referred to, so that a data prefetch instruction for transferring data in a designated address in advance from the main memory to the cache in parallel with execution of another instruction is provided.
A compiler analyzes a source program and optimizes code for generating a prefetch instruction so that the cycle of referring to the maim memory can be hidden. In this case, data length used for computation is generally about 1 to 8 bytes. Data transfer from a main memory to a cache is performed on the basis of a unit called a cache line of 128 or 64 bytes. Therefore, by one data prefetch, data to be referred to by plural repetition of a loop can be transferred to the cache.
In a conventional technique as disclosed in literature “Design and Evaluation of a Compiler Algorithm for Prefetching”, by T. C. Mowry et al., Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 62-73, 1992, spatial locality or temporal locality of data reference in a loop is obtained by analyzing array indices in the loop and, when a reuse rate is equal to or higher than a predetermined value, prefetching is performed. In this case, by combining the prefetching with loop unrolling, redundant prefetching to the same cache line is prevented, thereby reducing an instruction overhead incurred by prefetching.
The conventional technique will be described by using FIGS. 3A to 3C.
FIG. 3A shows an example of source code for performing a loop iteration process using indirect reference, to which the invention is applied to produce an effect. The code instructs execution of a process for computing the sum of values of an array A[ . . . ] by using the values of an array L[ . . . ] as indices from i to N. Such indirect reference code is frequency used for a process on a sparse matrix, or the like.
FIG. 3B shows an example of code which is obtained by optimizing the source code by the conventional technique and in which a prefetch instruction is inserted. In the conventional technique, an array index is analyzed at the time of compiling and only an array reference in which the reuse rate of a cache line is high is prefetched. Consequently, only the array L[ . . . ] which is continuously referred to becomes an object to be prefetched, and the array A[ . . . ] of which reuse rate is unknown does not become an object to be prefetched. α denotes the number of times of loop iteration until data arrives from the main memory to the cache by prefetching. In the example, it is assumed that the size of the cache line of the processor is 32 bytes. If the length of data used for computation is eight bytes for example, data of four elements are transferred in a lump from the main memory. Therefore, in the example, code is generated to perform prefetching once every four times of iteration.
In the conventional method as well, by always issuing the prefetch with respect to the array A[ . . . ] in which the reuse rate of the cache line is unknown, the array A[ . . . ] can be set as an object to be prefetched. In this case, however, it is necessary to issue a prefetch per indirect reference, and an instruction overhead increases. FIG. 3C shows an example of the code optimized in such a manner. When the code shown in FIG. 3C is compared with the code shown in FIG. 3B, the number of instructions for reference to the indices for prefetching and for prefetching of the array A[ . . . ] is larger in FIG. 3C. Due to this, prefetching for indirect reference is not performed.