The invention relates to a data processing device that has a forward load instruction that can be taken into execution before a store instruction that stores the data loaded by the forward load instruction. The invention also relates to a method of executing programs using a forward load instruction and to a method of generating machine code programs containing forward load instructions.
Forward loading is known from an article titled xe2x80x9cTolerating Data Access Latency with Register Preloadingxe2x80x9d, written by William Y. Chen, Scott A. Mahlke, Wen-mei W. Hwu, Tokuzo Kiyohara and Pohua P. Chang and published in the xe2x80x9cProceedings of the 1992 International Conference on Supercomputing.
To improve the efficiency of a computer program it is desirable that it is possible to change the sequence of execution of instructions in a program without changing the results of the program. The possible changes in sequence are limited by data dependencies between instructions, where a first instruction may affect the data used by a second instruction. In that case, the second instruction cannot normally be executed before the first instruction.
One particular type of dependency is xe2x80x9cload-storexe2x80x9d dependency, where a first instruction stores data to memory and a second instruction loads data from memory. When it is not known for certain that the second instruction loads from a different memory location than the first instruction, the sequence of executing the load instruction and the store instruction cannot normally be changed without affecting the results of the program.
This is a problem that is similar to the problems that occur in cache prefetching, which can be corrected by updating data in the cache when a store occurs. The article by Chen et al. applies this cache technique also to registers in the processor. Upon encountering a forward load instruction, the processor prefetches data from memory into a register. The load address used by the forward load instruction is saved after it has been used to load data. Subsequently, when a store instruction is executed, the store address of the store instruction is compared with the addresses used to prefetch data into each register. If the load and store addresses address the same data, the prefetched data in the relevant register is replaced by the store data that is stored by the store instruction.
The data is replaced from the time that the store instruction is completed. Thus, a register loaded with a forward load instruction always contains data that corresponds to the data that is actually in memory at the load address, no matter when the forward load instruction is executed. At the original location of the load instruction a xe2x80x9ccommitxe2x80x9d instruction is added to prevent store instructions after that location from causing a substitution with store data. As a result the forward load instruction can be moved freely through the program, past any store instructions, without affecting the result of the program.
The technique described by Chen requires considerable overhead: for each forwarded load instruction an additional commit instruction is used, and it is necessary to provide an associative memory function that can use the store address to find the register or registers that have to be updated as a result of a store instruction.
Amongst others, it is an object of the invention to provide a data processor device in which advantages of moving a load instruction past a preceding store instruction can be realized with a less complex solution.
According to the invention, compensation of the effect of out of order execution of memory access instructions is incorporated in pipelined execution of the memory access instruction. Hence, the memory address needs to be compared only with one or more memory addresses present in one or more of the stages downstream in the pipeline, and not with memory addresses for all available registers. At a pipeline stage that makes irreversible changes to memory or register content, such change are suppressed or data obtained from a different stage is substituted if the memory addresses the same memory location, so as to obtain the same effect as if the memory access instructions had been executed in the original order.
For example, substitution of load data is incorporated in pipelined execution of a forward load instruction before a store instruction that may affect a memory location from which the load instruction loads data. At the end of pipelined execution of the forward load, the loaded data or, if appropriate due to a store and load address match, the stored data is written back into the result register of the forward load instruction.
In another example, suppression or substitution of store data is incorporated in pipelined execution of a first store instruction executed after a forwarded store instruction that may affect a memory location from which the first store instruction stores data. At pipeline stage where the first store instruction stores data, the data of the first store instruction, if appropriate due to an address match, no data or substituted data is written to memory.
An embodiment of the data processor device according to the invention includes an instruction, which indicates for which pipeline stages relative to the pipeline stage that executes the instruction the effect of out of order execution is to be compensated. The instruction that would execute incorrectly due to out of order execution or the other instruction that causes this incorrect execution or both may be used to indicate that compensation is necessary for a certain pipeline distance. This makes it possible to process either a load/store instruction that involve forward loading/storing or a load/store instruction that does not involve such forward loading/storing alternatively at the same point in the pipeline. Thus, a compiler place either a memory access instruction that has been moved out of order, or a memory access instruction that has not been so moved at the same place in the program and the compiler can select the appropriate type for each such memory access instruction to indicate whether it is necessary to provide for correction of the effect of movement with respect to indicated other memory instruction at selected distances relative to the memory access instruction.