Conventional computer systems utilize memory systems that provide data to the central processing unit (CPU) in response to load instructions and store data into the memory systems in response to store instructions. The latency time of the memory system is defined to be the number of cycles after the initiation of the load operation at which the data for the load is returned from the memory and is available for use. In many cases the latency of load instructions is critical to the efficiency of the program being executed by the system. One method for avoiding this inefficiency is to issue the load instruction sufficiently before the need for the data to allow the memory time to retrieve the data and have it ready when needed.
The manner in which aggressive: loading of data from the memory system can reduce program execution time can be more easily understood with reference to the following simple computer program.
R1=(A1) PA1 (A2)=52 PA1 (A3)=64 PA1 R2=(A4) PA1 R3=R2+2 PA1 lead from (A1) into R1 PA1 store 52 into (A2) PA1 store 64 into (A3) PA1 load from (A4) into R2 PA1 stall PA1 stall PA1 add 2 to R2 and store into R3 PA1 load from (A1) into R1 PA1 load from (A4) into R2 PA1 store 52 into (A2) PA1 store 64 into (A3) PA1 add 2 to R2 and store into R3 PA1 load from (A1) into R1 PA1 store 52 into (A2) PA1 store 64 into (A3) PA1 load from (A3) into R2 PA1 add 2 to R2 and store in R3 PA1 load from (A1) into R1 PA1 store 52 into (A2) PA1 store 64 into (A3) PA1 load from (A3) into R2 PA1 stall PA1 stall PA1 add 2 to R2 and store in R3 PA1 load from (A1) into R1 PA1 load from (A3) into R2 PA1 store 52 into (A2) PA1 store 64 into (A3) PA1 add 2 to R2 and store in R3
Here, memory addresses are shown with () and registers are denoted by Rn.
It will be assumed that the latencies of the add and memory store operations are 1 cycle each, and the latency of the memory load operation is 3 cycles. If the operations are executed in the order implied in the program, then the program will require 7 cycles to execute. The 7 cycles are as follows:
The two "stall" instructions are needed to allow the memory system time to finish the load operation before using the value of R2 in the last instruction.
However, if the load instruction for loading into R2 is initiated before the instructions for storing into (A2) and (A3), the program requires only the following 5 cycles:
Issuing a long latency load instruction early, however, is not always possible because of store instructions that precede the load instruction. This problem can be more easily understood with reference to the following program:
The normal execution of this program would be as follows:
If the compiler were to move the instruction loading into R2 up two instructions, the following code would be generated
This code will not execute the program correctly since there is an instruction storing into memory location (A3) that will now be executed after the load from (A3); whereas, in the original program, the store into (A3) would be executed before the load from (A3). The load from (A3) in the original program would return the value 64, i.e., the stored into (A3) by the previous store instruction; whereas, the load from (A3) in the modified program will return the contents of (A3) prior to the modification of memory location (A3) by the store into (A3).
In the simple example given above, the compiler could, in principle, detect the problem and forego the early issuance of the load instruction. Unfortunately, most programs use indirect referencing schemes that make such an approach impractical. Hence, compilers must generate conservative code that does not take advantage of the early issuance of load instructions having long latencies.
Broadly, it is the object of the present invention to provide a computer memory system that allows the compiler to issue long latency load instructions early.
It is a further object of the present invention to provide a memory system in which long latency load instructions can be issued early even in systems in which indirect addressing is utilized.
These and other objects of the present invention will become apparent to those skilled in the art from the following detailed description of the present invention and the accompanying drawings.