The present invention relates generally to microprocessors, and more specifically to microprocessors capable of reusing regions of software code.
Modern software programs include many instructions that are executed multiple times each time the program is executed. Typically, large programs have logical xe2x80x9cregionsxe2x80x9d of instructions, each of which may be executed many times. When a region is one that is executed more than once, and the results produced by the region are the same for more than one execution, the region is a candidate for xe2x80x9creuse.xe2x80x9d The term xe2x80x9creusexe2x80x9d refers to the reusing of results from a previous execution of the region.
For example, a reuse region could be a region of software instructions that, when executed, read a first set of registers and modify a second set of registers. The data values in the first set of registers are the xe2x80x9cinputsxe2x80x9d to the reuse region, and the data values deposited into the second set of registers are the xe2x80x9cresultsxe2x80x9d of the reuse region. A buffer holding inputs and results can be maintained for the region. Each entry in the buffer is termed an xe2x80x9cinstance.xe2x80x9d When the region is encountered during execution of the program, the buffer is consulted and if an instance with matching input values is found, the results can be used without having to execute the software instructions in the reuse region. When reusing the results is faster than executing the software instructions in the region, performance improves. Such a buffer is described in: Daniel Connors and Wen-mei Hwu, xe2x80x9cCompiler-Directed Dynamic Computation Reuse: Rationale and Initial Results,xe2x80x9d Proceedings of the 32nd Annual International Symposium on Microarchitecture (MICRO), November 1999.
The example of the previous paragraph works well when the results are a function of nothing but the input values. When the results are a function of more than the input values, reuse is more complicated. For example, if a memory load instruction occurs in the reuse region, the results can be a function of the input values as previously described, and can also be a function of the data value loaded from the memory. If the memory load instruction accesses a memory location that is changed by a memory update instruction outside the region, then the region is said to be xe2x80x9caliased.xe2x80x9d
Aliased regions present a problem for reuse. Even when a matching instance exists in the reuse buffer, the reuse instance may not be usable because the aliased memory load may read a different value that causes the correct results to differ from the results in the instance. Connors and Hwu present an xe2x80x9cinvalidatexe2x80x9d instruction that invalidates the reuse buffer instances for a region such that they cannot be reused. The invalidate instruction is placed after memory update instructions capable of writing to the same location that the aliased load instruction accesses, but it can be difficult to find all of the memory update instructions that may update the aliased address. Even if all of the appropriate instructions are found, this approach is conservative in part because the memory update instruction may update an address other than the aliased address, but the invalidate instruction will invalidate the region nonetheless.
For the reasons stated above, and for other reasons stated below which will become apparent to those skilled in the art upon reading and understanding the present specification, there is a need in the art for an alternate method and apparatus for code reuse.
In one embodiment, a processing apparatus includes a first processor core configured to speculatively execute instructions based on results from an instance of a reuse region, and a second processor core configured to verify the results from the instance of the reuse region. The processing apparatus can also include a thread queue coupled between the first processor core and the second processor core, where the thread queue is configured to communicate a thread structure describing the reuse region from the first processor core to the second processor core.
In another embodiment, a processing apparatus includes a reuse buffer configured to hold instances of reuse regions, and also includes a reuse invalidation buffer configured to have entries that point to at least one of the instances of reuse regions held in the reuse buffer.
In another embodiment, a computer-implemented method for annotating a software program includes identifying a reuse region within the software program, determining whether the reuse region is aliased, and when the reuse region is aliased, adding a speculative reuse instruction to the reuse region.