The present invention generally relates to a method and apparatus for processing a load-lock instruction within a computer processor. More particularly, the invention relates to a system and method for processing a load-lock instruction within an out-of-order computer processor using a scoreboard mechanism.
Many processors, such as the Pentium® processor commercially available from Intel Corp., are “out-of-order” processors. An out-of-order processor speculatively executes instructions in any order as the requisite data and execution units become available. Some instructions in a computer system are dependent on other instructions through machine registers. Out-of-order processors attempt to exploit parallelism by actively looking for instructions whose input sources are available for computation, and scheduling them for execution even if other instructions that occur earlier in program flow (program order) have not been executed. This creates an opportunity for more efficient usage of machine resources and faster overall execution.
Load-lock instructions are used in multi-tasking/multi-processing systems to operate on semaphores. Semaphores are flag variables used to guard resources or data from simultaneous access by more than one agents in a multiprocessor system because it can lead to indeterminate behavior of a program. To guarantee unique access to a semaphore, a load-lock instruction in conjunction with a store-unlock instruction must be executed in an atomic fashion. That is, once the load-lock instruction accesses the semaphore value, no other instruction can operate on the semaphore until the corresponding store-unlock instruction frees it. The load-lock/store-unlock instruction duo also introduces another requirement in x86 processors in that all load instructions and all store instructions before the load-lock/store-unlock instruction duo in program order must be performed before the atomic operation. Also all subsequent load instructions and store instructions following the load-lock/store-unlock instruction duo in program order must not be performed until after both the load-lock/store-unlock instructions are completely executed. This “fencing” semantic must not be violated in any x86 program execution.
Speculative execution means that instructions can be fetched and executed before resolving pertinent control dependencies. Executing a “load-lock” instruction in a speculative out-of-order manner implies that the fencing semantics of the load-lock/store-unlock instruction duo can be violated if not handled correctly. However, if the load-lock instruction can be executed speculatively, there can be substantial performance improvements because the execution can be done when resources can be available and not when all instructions before the load-lock instruction have been completed.
Conventional methods in handling load-lock instructions in an out-of-order machine guarantee the fencing semantics by executing the load-lock instruction only when the instruction has reached “at-retirement”. The “at-retirement” (or “at-retire”) condition is flagged when an instruction is the next to be retired in program order. That is, all prior instructions in program order have already been retired. Moreover, such conventional methods lump all lock instructions whether they are split or not split across two cache lines (i.e., “split” or “non-split” lock operations), and whether they are to writeback in a cacheable region or not. As a result, substantially extraneous time and resources are applied broadly to prepare for and to process any load-lock instruction. Such approaches create a large latency and tie up significant processing resources for a load-lock instruction to be executed when a load-lock instruction becomes eligible for retirement.