1. Field of the Invention
The present invention relates to operating systems for computers. More specifically, the present invention relates to a method and an apparatus for handling monitor entry and exit operations in order to restrict accesses to critical sections for a speculative thread that speculatively executes program instructions in advance of a head thread during space and time dimensional program execution.
2. Related Art
As increasing semiconductor integration densities allow more transistors to be integrated onto a microprocessor chip, computer designers are investigating different methods of using these transistors to increase computer system performance. Some recent computer architectures exploit xe2x80x9cinstruction level parallelism,xe2x80x9d in which a single central processing unit (CPU) issues multiple instructions in a single cycle. Given proper compiler support, instruction level parallelism has proven effective at increasing computational performance across a wide range of computational tasks. However, inter-instruction dependencies generally limit the performance gains realized from using instruction level parallelism to a factor of two or three.
Another method for increasing computational speed is xe2x80x9cspeculative executionxe2x80x9d in which a processor executes multiple branch paths simultaneously, or predicts a branch, so that the processor can continue executing without waiting for the result of the branch operation. By reducing dependencies on branch conditions, speculative execution can increase the total number of instructions issued.
Unfortunately, conventional speculative execution typically provides a limited performance improvement because only a small number of instructions can be speculatively executed. One reason for this limitation is that conventional speculative execution is typically performed at the basic block level, and basic blocks tend to include only a small number of instructions. Another reason is that conventional hardware structures used to perform speculative execution can only accommodate a small number of speculative instructions.
What is needed is a method and apparatus that facilitates speculative execution of program instructions at a higher level of granularity so that many more instructions can be speculatively executed.
One challenge in designing a system that supports speculative execution is to provide an efficient mechanism to restrict access to critical sections. In non-speculative systems, this is typically accomplished by using a lock, such as a monitor or a semaphore to restrict access to critical sections. Unfortunately, the process of acquiring and releasing locks can seriously degrade computer system performance, because acquiring and releasing a lock may cause a cache miss, and may require load buffers and/or store buffers for the processor to be flushed. Consequently, the process of acquiring or releasing a lock may take up to hundreds of processor clock cycles. Also, if a single thread is executing multithreaded library routines that manipulate locks, the effort in acquiring and releasing locks is wasted, because the locks are not required for single-threaded execution.
Hence, what is needed is a method and an apparatus for handling accesses to critical sections for a speculative thread without incurring unnecessary overhead in acquiring and releasing associated locks.
One embodiment of the present invention provides a system that facilitates entering and exiting a critical section of code for a speculative thread. The system supports a head thread that executes program instructions, and the speculative thread that speculatively executes program instructions in advance of the head thread. During an entry into the critical section by the speculative thread, the system increments a variable containing a number of virtual locks held by the speculative thread. Note that a virtual lock held by the speculative thread is associated with the critical section and is used to keep track of the fact that the speculative thread has entered the critical section. Also note that this virtual lock does not prevent the speculative thread or other threads from entering the critical section. During an exit from the critical section by the speculative thread, the system decrements the variable containing the number of virtual locks held by the speculative thread. The speculative eventually receives a request to perform a join operation with the head thread to merge state associated with the speculative thread into state associated with the head thread. Upon receiving this request, the speculative thread waits to perform the join operation until the variable containing the number of virtual locks held by the speculative thread equals zero.
In one embodiment of the present invention, the system additionally supports other head threads that execute program instructions for a parallel computational task in parallel with the head thread. In this embodiment, during the entry into the critical section by the speculative thread, the system additionally adds an entry for the virtual lock associated with the critical section into a list of virtual locks visited by the speculative thread. Upon receiving the request to perform the join operation, the system waits to perform the join operation until no virtual locks in the list of virtual locks are held by the other head threads. In a variation on this embodiment, the system waits to perform the join operation by using the head thread to acquire non-virtual locks associated with virtual locks in the list of virtual locks. In a variation on this embodiment, the entry for the virtual lock in the list of virtual locks contains an identifier for a corresponding non-virtual lock for the critical section, wherein the non-virtual lock is used to ensure mutual exclusion amongst the head thread and the other head threads working on the parallel computational task.
In one embodiment of the present invention, upon receiving a request to perform a write operation by the head thread, the system performs the write operation to a primary version of the memory element. The system also checks status information associated with the memory element to determine if the memory element has been read by the speculative thread. If the memory element has been read by the speculative thread, the system causes the speculative thread to roll back so that the speculative thread can read a result of the write operation. If the memory element has not been read by the speculative thread, the system performs the write operation to a space-time dimensioned version of the memory element if the space-time dimensioned version exists.
In one embodiment of the present invention, performing the join operation includes merging the space-time dimensioned version of the memory element into the primary version of the memory element and discarding the space-time dimensioned version of the memory element.
In one embodiment of the present invention, the memory element includes an object defined within an object-oriented programming system.
In one embodiment of the present invention, the head thread and the speculative thread perform the join operation in parallel.