Microelectronic manufacturers are continually striving to improve the speed and performance of microprocessors and other processing devices, the performance of such devices being dependent upon many factors. One factor affecting the performance of a processing device is the scheduling and execution of instructions associated with a piece of code executing on that processor. To increase the speed at which a set of instructions can be executed—and, hence, to improve efficiency and performance—multi-threaded processors and multi-processor systems have been devised. Performance may also be enhanced using speculative and/or out-of-order execution of instructions. In out-of-order processing, a piece of code is not necessarily executed in the same sequence as its underlying source code and, in speculative processing, instructions are prefetched and branch prediction is performed to “guess” whether a branch condition will, or will not, be taken.
Typically, a processor includes an instruction decoder that decodes an instruction to create one or more micro-instructions, or micro-operations, that can be understood and executed by the processor. A micro-operation will also be referred to herein as a “μOP.” A series of μOPs associated with a piece of code may be scheduled for execution on a processor (or on a specific thread thereof), this scheduling potentially being speculative or out-of-order, as noted above. If a μOP properly executes, that μOP is retired. However, if a μOP does not, for any reason, properly execute, the μOP is again scheduled and replayed for execution. Although the set of μOPs associated with the piece of code may be executed out of order, the μOPs must generally be retired in order.
For systems incorporating multi-threaded processors and/or multiple processing devices, the multiple threads and/or multiple processors may often times need to share data stored within the system. Care must be taken to insure that a thread or processor accesses the most recent and up-to-date data and also to insure that a thread or processor does not access and modify data currently associated with another thread or processor. Further complicating this sharing of data, most modern-day processing devices include one or more on-chip cache memories. Within a multi-processor system, the multiple on-chip caches will often—and, in practice, generally do—contain multiple copies of a data item. Accordingly, when a thread or processor accesses a copy of a data item, it must be insured that an updated or valid data value is read.
Thus, in multi-threaded processors and/or multi-processor systems, “cache coherency” must be maintained. Cache coherency refers to the synchronization of data written from, or read into, cache memory, such that any data item stored in a cache that is accessed by a thread or processor is the most recent copy of that data item. Further, any data value written from cache back into main memory should be the most current data. The accuracy and performance of speculative and out-of-order processing is highly dependent upon the consistency and synchronization of data.
One method of maintaining cache coherency and insuring that, when a data item is needed by a thread or processor, the most up-to-date value for that data item is accessed is to implement a “lock.” A lock comprises a process that is performed in response to a load instruction—i.e., a μOP issued by a processor or thread requesting a specific data item from memory—to insure synchronization between processors and/or threads. A lock is an attribute of a load instruction, and the lock is typically indicated by a tag associated with the lock. A load instruction that has been tagged for lock will be referred to herein as a “locked μOP.”
Generally, a lock is associated with a set of instructions, including the load instruction, an instruction to modify the data item, and a store instruction (i.e., a μOP issued by a processor to write the modified data item to memory). The lock—also referred to herein as a “lock sequence” or “lock operation”—may, for example, include acquiring ownership of a memory location that stores data (that is the subject of a tagged load instruction), performing an atomic operation on the data while preventing other processes from operating on that data, and releasing ownership of the memory location after the atomic operation is performed. An atomic operation is one that is performed sequentially and in an uninterrupted manner and, further, that is guaranteed to be completed or not completed at all (i.e., the operation is indivisible). Because execution of the set of μOPs (i.e., the load, modify, and store instructions) is atomic, the entire lock sequence is sometimes viewed as a single μOP (e.g., it appears like a single operation).
While use of locks can insure cache coherency and data integrity, this mechanism is not without its disadvantages. Specifically, the processing of a lock can introduce significant latency into the execution of a piece of code.