The executing time of a program depends significantly on the waiting time generated by the dependent relationship between instructions and the waiting time generated by memory references.
The waiting time generated by the dependent relationship between instructions within a loop can be considerably reduced by using a software pipelining scheduling method. Software pipelining as described, for example, in "Software Pipelining in PA-RISC Compiler" by S. Ramakrishnan, Hewlett-Packard Journal, pp. 39-45, 1992, reduces the waiting time generated by the dependent relationship between instructions and enhances the degree of parallelism in execution of instructions by overlapped execution of different iterations of the loop. The loop to which the software pipelining is applied is characterized by executing the code for initialization called a prologue before starting execution of the loop, executing the loop body by repeating code called a kernel, terminating the process by executing code called an epilogue when execution of the loop is completed, and starting execution of the subsequent iteration without waiting for the completion of the preceding iteration.
It is rather difficult, in comparison with the waiting time generated by the dependent relationship between instructions, to reduce the waiting time associated with the memory references only with a software method. Therefore, in many computer systems, a high speed and small capacity memory called a cache memory is provided between the main memory and a processor to reduce the waiting time generated by a memory reference and thereby a high speed reference can be made on the cache memory to the data referred to recently. However, even when a cache memory is used, the waiting time is inevitably generated if a cache miss occurs while there is no recycle use of data.
Therefore, as described, for example, in "Design and Evaluation of a Compiler Algorithm for Prefetching" by T. C. Mowry, et al., Proceedings of the 5th International Conference on Architectural Support for Programming Language and Operating Systems, pp. 62-73, 1992 for example, an attempt is made to reduce the waiting time generated by the memory references by utilizing an instruction for prefetching data from the main memory to the cache memory.