1. Field of the Invention
The present invention relates to techniques for improving computer system performance. More specifically, the present invention relates to a method and an apparatus that provides a parallel join operation to support space and time dimensional execution of a computer program.
2. Related Art
As increasing semiconductor integration densities allow more transistors to be integrated onto a microprocessor chip, computer designers are investigating different methods of using these transistors to increase computer system performance. Some recent computer architectures exploit xe2x80x9cinstruction level parallelism,xe2x80x9d in which a single central processing unit (CPU) issues multiple instructions in a single cycle. Given proper compiler support, instruction level parallelism has proven effective at increasing computational performance across a wide range of computational tasks. However, inter-instruction dependencies generally limit the performance gains realized from using instruction level parallelism to a factor of two or three.
Another method for increasing computational speed is xe2x80x9cspeculative executionxe2x80x9d in which a processor executes multiple branch paths simultaneously, or predicts a branch, so that the processor can continue executing without waiting for the result of the branch operation. By reducing dependencies on branch conditions, speculative execution can increase the total number of instructions issued.
Unfortunately, conventional speculative execution typically provides a limited performance improvement because only a small number of instructions can be speculatively executed. One reason for this limitation is that conventional speculative execution is typically performed at the basic block level, and basic blocks tend to include only a small number of instructions. Another reason is that conventional hardware structures used to perform speculative execution can only accommodate a small number of speculative instructions.
What is needed is a method and apparatus that facilitates speculative execution of program instructions at a higher level of granularity so that many more instructions can be speculatively executed.
One challenge in designing a system that supports speculative execution is to efficiently merge state created during speculative execution into the non-speculative state of the program. If this merging process takes too much time, it can nullify the performance gains derived from speculative execution.
What is needed is a method and an apparatus that efficiently merges state created during speculative execution into the non-speculative state of a program. For efficiency reasons, it is desirable to perform this merging in parallel.
One embodiment of the present invention provides a system that supports space and time dimensional program execution by performing a parallel join operation to merge state created during speculative execution into the non-speculative state of a program. The system executes a program using a head thread that operates on primary versions of memory elements and accesses a primary version of a stack. The system also executes the program using a speculative thread that speculatively executes program instructions in advance of the head thread while the head thread is executing. This speculative thread operates on space-time dimensioned versions of the memory elements and accesses a speculative version of the stack. The system performs a join operation between the head thread and the speculative thread when the head thread reaches a point in the program where the speculative thread began executing. This join operation involves using both the head thread and the speculative thread to perform a number of operations in parallel. These operations include merging the space-time dimensioned versions of the memory elements into the primary versions of the memory elements so that updates to the space-time dimensioned versions of the memory elements are incorporated into corresponding primary versions of memory elements. These operations also include merging the speculative version of the stack into the primary version of the stack.
In one embodiment of the present invention, during the join operation the head thread merges the speculative version of the stack into the primary version of the stack while the speculative thread merges the space-time dimensioned versions of the memory elements into the primary versions of the memory elements.
In one embodiment of the present invention, if the head thread finishes merging the speculative version of the stack before the speculative thread finishes merging the space-time dimensioned versions of the memory elements, the head thread helps the speculative thread in merging the space-time dimensioned versions of the memory elements into the primary versions of the memory elements.
In one embodiment of the present invention, if the speculative thread finishes merging the space-time dimensioned versions of the memory elements before the head thread finishes merging the speculative version of the stack, the speculative thread helps the head thread in merging the speculative version of the stack into the primary version of the stack.
In one embodiment of the present invention, the head thread continues executing the program as a pseudo-head thread that operates on the space-time dimensioned versions of the memory elements using the speculative version of the stack. At the same time, the speculative thread merges the space-time dimensioned versions of the memory elements into the primary versions of the memory elements.
In one embodiment of the present invention, merging the speculative version of the stack into the primary version of the stack includes inserting a stub at the bottom of the speculative version of the stack. If the pseudo-head thread encounters the stub (upon return from the method whose frame was previously copied), the pseudo-head thread copies an additional frame from the primary version of the stack to the speculative version of the stack in place of the stub, and moves the stub below the additional frame.