1. Field
The present disclosed embodiments relate generally to computing, and more specifically to performing advanced prefetch operations in processors.
2. Background
Computer programs are lists of instructions that, when executed, cause a computer to behave in a predetermined manner. In general, a program may contain a list of variables and a list of statements that tell the computer what to do with the variables. A programmer may write a computer program in a “high-level” programming language, which is easily understood by humans. This form of the program is called the “source code. To execute the program on a computer, however, the source code must be converted into machine language, which is the “low level” language that is usable by the computer.
The first step of this translation process is usually performed by a utility called a compiler, which interprets the source code into a form closer to machine language. A compiler can have additional functions besides this interpretation function. For example a compiler can look at the source code and re-order some of the instructions in it as well as performing other optimizations. The compiler converts the source code into a form called “objects code.” Sometimes the object code is the same as machine language; sometimes it needs to be further processed before it is ready to be executed by the computer.
One optimization compilers may perform is in re-ordering instructions within a computer program to operate more efficiently than a simple conversion of the programmer's version of the source code would have yielded.
For example, a program may operate on a variable. Commonly variables are located in memory and must be accessed before they are available for use. In a processor such an access of memory takes a finite amount of time. If the variable has not been obtained from memory when the program is ready to use it a delay may be encountered while the variable is transferred into memory.
Two common types of computer instructions are load instructions (“Loads”) and store instructions (“Stores”). Loads may access memory to fetch data that is needed by the program. Stores are often considered secondary because they merely store final data to memory, such as a final computation result that is not subsequently needed by the program. Therefore, program efficiency may be improved by advancing Loads ahead of Stores.
Unfortunately, this technique causes a significant problem called “Load/Store aliasing.” A Load/Store alias occurs when a Store writes data to the same memory address that a Load reads from. FIG. 1. illustrates an example of this situation. A processor register 100 may contain a series of instructions from a computer program being executed. The programmer may have included a Load 102A just before a “Use” instruction (“Use”) 104 in the source code. The Use 104 may be a calculation utilizing data that was retrieved by the Load 102A. As explained above, a compiler may improve overall program efficiency at run time by hoisting the Load 102A higher above the Use 104 than the programmer had originally placed it in the source code, indicated by arrow 106. One reason is that the process of accessing a computer's memory is sometimes slow, and if the Load 102A and the Use 104 are too close together, then when the computer encounters the Use 104 it may have to wait for the Load 102A to retrieve data needed to perform the Use 104. If a compiler can put the Load 102A earlier, such as at position 102B, then the computer will be more likely to already have the retrieved data by the time it encounters the Use 104. Thus, by hoisting Loads above Use instructions, a compiler can reduce waiting time and increase program efficiency.
However, if a the Load 102 is hoisted too far above the Use 104, it may be hoisted above an intervening Store 110 as indicated by arrow 108. If the intervening store 110 happens to write new data to the same memory address accessed by the Load 102, Load/Store aliasing occurs. In operation, the Load 102C will read data (such as the value “0”) from a specified memory address, then the intervening Store 110 will save new data (such as the value “1”) to that same memory address. When the Use 104 is encountered, it will receive the “0” instead of the “1,” because “0” was the value read by the Load 102C. However, the programmer may have intended the Use 104 to receive the value “1,” which is why he would have placed the intervening Store 110 (which stores the value “1”) before the Load 102A and the Use 104 when writing the source code. By moving the Load 102A any higher than the intervening Store 110, then, the compiler can cause the Use 104 to receive incorrect data. Therefore, although it may generally be beneficial to hoist Loads above Stores, most compilers are limited by intervening Stores. This presents significant performance problems in high-performance microprocessors and parallelizing compilers.
One method for dealing with this issue is called “data speculation.” Specialized instructions called “Advanced Load” (“LD.A”) and “Check Advanced Load” (“CHK.A”) are employed by data speculation. LD.A is a Load that, when retrieving data from a memory address, inserts that memory address into a table called the “Advanced Load Address Table” (“ALAT”). The loaded data is then used speculatively by other program instructions. Meanwhile, all Stores, when storing data to a memory address, compare that address against the addresses registered in the ALAT. Any matching entries (aliases) are evicted from the ALAT. When a subsequent CHK.A detects that a value has been evicted from the ALAT, it may generate an exception.
An exception is a condition that causes a program or microprocessor to branch to a different routine, and usually indicates an error condition. In this case, the exception generated when an ALAT eviction is detected triggers recovery of the speculative use of the data previously retrieved by the LD.A. That data turned out to be incorrect data (caused by the aliasing), so its use must be rectified in the exception-triggered recovery process. Such recovery requires a significant amount of work and processing time, and considerably hampers performance. Thus, generation of exceptions is not desired, and excessive numbers of exceptions may significantly counteract any gains that were achieved when the compiler reordered the instructions in the first place.