The present invention relates generally to memory traffic optimization, and more particularly to a system and method for removing a partially redundant load arising across a sequence of blocks.
With the increase in the gap between memory access time and CPU performance comes an increased interest in optimizations that reduce memory traffic. The removal of redundant load instructions is an example of a memory traffic optimization. A load to a register is redundant if it is preceded by a store that writes to the same address and the stored value has not been overwritten by another store prior to the load. As shown in FIG. 1A, a load instruction 20 loads a value into a register gr5 from a memory location whose address is computed as the sum of the value held in a base register gr2 and a displacement value, i.e., 16(gr2). The load 20 is redundant because it is preceded by a store 10 that writes the value in register gr1 to the same address, i.e., 16(gr2), and the stored value is not overwritten by another store prior to the load 20.
The redundant load can be removed by replacing it with an appropriate register copy instruction. As shown in FIG. 1B, the load 20 is replaced by a register copy 30, which copies the value of register gr1 into register gr5. This replacement results in a vast improvement in memory cycle time. Whereas a load instruction may take as many twenty to thirty memory cycles to complete, a register copy instruction may take only one or two.
Often a load instruction is redundant in some but not all instances of the load. There exists a set of program paths along which the load is redundant and a set of paths along which the load is not redundant. Such a load is called a xe2x80x9cpartially redundantxe2x80x9d load. FIG. 2A shows an example of a load that is redundant only along one path. As shown in FIG. 2A, a joint point 25 funnels both the store instruction 10 and an instruction 40 to the load instruction 20. Although the store instruction 10 and the load instruction 20 are redundant, the instruction 40 and the load instruction 20 may not be. As a result, the load instruction 20 cannot necessarily be replaced with a copy instruction 30.
Previous approaches for removing partially redundant loads require that the set of paths along which the load is redundant can be statically determined. A number of compile-time strategies have been developed to remove partially redundant instructions by sometime expensive code transformations. The idea is to isolate the execution paths along which the load is redundant via code duplication and/or code motion. Enough copies of the load are generated such that along each path the load is either fully redundant (in which case it can be eliminated) or not redundant at all.
Previous techniques to remove partially redundant loads are based on two important assumptions that limit their effectiveness. First, the techniques are compile-time approaches that operate on some intermediate form. Second, all previous techniques require that the set of instances in which a partially redundant load is truly redundant can be statically determined.
The first assumption implies that all redundant loads that are created during register allocation and code generation (that is, after compile-time optimizations have been applied) will be missed. While it is conceptually possible to apply the compile-time techniques to object code, the code transformations that are required are too substantial to be implemented after code generation since they might end up requiring a complete re-generation of the code.
The second assumption is more severe. Even if only a single path is considered, it may not be possible to determine whether a load is redundant or not, due to the inability to disambiguate memory references statically. Consider the example in FIG. 2B. Even after isolating the path of instructions 10, 50 and 20, it cannot be determined whether the load 20 is redundant due to the intervening store 50. While path transformations are useful to isolate single paths, they do not help in exposing redundancies of the form of FIG. 2B. The cause of the problem is that these redundancies involve an intervening memory reference that cannot be statically disambiguated.
Briefly, a method for optimizing instructions in a program consistent with the present invention identifies a series of instructions in which a first instruction either stores a value in a first register into a first memory location or loads a value from the first memory location into the first register, a second instruction subsequent to the first instruction stores a value in a second register into a second memory location, and a third instruction subsequent to the second instruction loads a value from the first memory location into a third register. A fourth instruction is inserted between the second and third instruction which copies the value in the first register into the third register. It is determined if the first memory location and the second memory location are the same, and the third instruction is conditionally executed depending on a result of the determination.
In another aspect of the invention, the third instruction is executed if the first and second memory locations are the same and is nullified if the first and second memory locations are different. In yet another aspect of the invention, the insertion of the fourth instruction is done during compile-time or during run-time. The determination may be made during run-time.
In another aspect of the invention, a fifth instruction is inserted between the fourth instruction and the third instruction which compares the first memory location and the second memory location, wherein the fifth instruction nullifies the third instruction if the first memory location and the second memory location are different.
In yet another aspect of the invention, a method for optimizing instructions in a program identifies a series of instructions in which a first instruction either stores a value in a first register into a first memory location or loads a value from the first memory location into the first register, a second instruction subsequent to the first instruction stores a value in a second register into a second memory location, and a third instruction subsequent to the second instruction loads a value from the first memory location into a third register. The third instruction is replaced with a fourth instruction copying the value in the first register into the third register if the first memory location and the second memory location are different or if the third instruction is a redundant load.