1. Field of the Invention
The present invention relates to techniques for improving computer system performance. More specifically, the present invention relates to a method and apparatus that enforces dependencies between memory references passing through a load store unit.
2. Related Art
As increasing semiconductor integration densities allow more transistors to be integrated onto a microprocessor chip, computer designers are investigating different methods of using these transistors to increase computer system performance. Some recent computer architectures exploit xe2x80x9cinstruction level parallelism,xe2x80x9d in which a single central processing unit (CPU) issues multiple instructions in a single cycle. Given proper compiler support, instruction level parallelism has proven effective at increasing computational performance across a wide range of computational tasks. However, inter-instruction dependencies generally limit the performance gains realized from using instruction level parallelism to a factor of two or three.
Another method for increasing computational speed is xe2x80x9cspeculative executionxe2x80x9d in which a processor executes multiple branch paths simultaneously, or predicts a branch, so that the processor can continue executing without waiting for the result of the branch operation. By reducing dependencies on branch conditions, speculative execution can increase the total number of instructions issued.
Unfortunately, conventional speculative execution typically provides a limited performance improvement because only a small number of instructions can be speculatively executed. One reason for this limitation is that conventional speculative execution is typically performed at the basic block level, and basic blocks tend to include only a small number of instructions. Another reason is that conventional hardware structures used to perform speculative execution can only accommodate a small number of speculative instructions.
What is needed is a method and apparatus that facilitates speculative execution of program instructions at a higher level of granularity so that many more instructions can be speculatively executed.
A significant performance drawback for high performance computer systems is the need to periodically perform xe2x80x9cmembarxe2x80x9d operations in order to flush read and write requests from of a load store unit (LSU) out to memory. A membar operation is typically performed to ensure that a particular read operation does not overtake a preceding write operation by flushing a write buffer in the LSU before the read operation takes place. A membar operation may also be performed to ensure that a particular write operation does not overtake a preceding read operation by to flushing a read buffer in the LSU before the write operation takes place.
Note that using membar operations can adversely affect system performance because membar operations stall the processor while requests in the LSU are flushed. This introduces delay unnecessarily because it typically suffices to ensure that a particular read request does not overtake a particular write request. Hence, waiting until all requests are flushed out of the LSU is often unnecessary.
What is needed is a method and apparatus that enforces dependencies between memory references without incurring the delays inherent in membar operations.
One embodiment of the present invention provides a system that enforces dependencies between memory references within a load store unit (LSU) in a processor. When a write request is received in the load store unit, the write request is loaded into a store buffer in the LSU. The write request may include a xe2x80x9cwatch addressxe2x80x9d specifying that a subsequent load from the watch address cannot occur before the write request completes. Note that the watch address is not necessarily the same as the destination address of the write operation. When a read request is received in the load store unit, the read request is loaded into a load buffer. The system determines if the read request is directed to the same address as a matching watch address in the store buffer. If so, the system waits for the write request associated with the matching watch address to complete before completing the read request.
In one embodiment of the present invention, if the read request is directed to the same address as a matching write request in the store buffer, the system completes the read request by returning a data value contained in the matching write request without going out to memory.
In one embodiment of the present invention, when the read request is directed to the same address as a matching watch address in the store buffer, the system stores an index with the read request in the load buffer. This index specifies a location of the associated write request in the store buffer.
In one embodiment of the present invention, the system provides an executable code write instruction that specifies the watch address.