High performance processors currently used in data processing systems today may be capable of “superscalar” operation and may have “pipelined” elements. Such processors typically have multiple elements which operate in parallel to process multiple instructions in a single processing cycle. Pipelining involves processing instructions in stages, so that the pipelined stages may process a number of instructions concurrently.
In a typical first stage, referred to as an “instruction fetch” stage, an instruction is fetched from memory. Then, in a “decode” stage, the instruction is decoded into different control bits, which in general designate i) a type of functional unit (e.g., execution unit) for performing the operation specified by the instruction, ii) source operands for the operation and iii) destinations for results of operations. Next, in a “dispatch” stage, the decoded instruction is dispatched to an issue queue (ISQ) where instructions wait for data and an available execution unit. Next, in the “issue” stage, an instruction in the issue queue is issued to a unit having an execution stage. This stage processes the operation as specified by the instruction. Executing an operation specified by an instruction includes accepting one or more operands and producing one or more results.
A “completion” stage deals with program order issues that arise from concurrent execution, wherein multiple, concurrently executed instructions may deposit results in a single register. It also handles issues arising from instructions subsequent to an interrupted instruction depositing results in their destination registers. In the completion stage an instruction waits for the point at which there is no longer a possibility of an interrupt so that depositing its results will not violate the program order, at which point the instruction is considered “complete”, as the term is used herein. Associated with a completion stage, there are buffers to hold execution results before results are deposited into the destination register, and buffers to backup content of registers at specified checkpoints in case an interrupt needs to revert the register content to its pre-checkpoint value. Either or both types of buffers can be employed in a particular implementation. At completion, the results of execution in the holding buffer will be deposited into the destination register and the backup buffer will be released.
While instructions for the above described processor may originally be prepared for processing in some programmed, logical sequence, it should be understood that they may be processed, in some respects, in a different sequence. However, since instructions are not totally independent of one another, complications arise. That is, the processing of one instruction may depend on a result from another instruction. For example, the processing of an instruction which follows a branch instruction will depend on the branch path chosen by the branch instruction. In another example, the processing of an instruction which reads the contents of some memory element in the processing system may depend on the result of some preceding instruction which writes to that memory element.
As these examples suggest, if one instruction is dependent on a first instruction and the instructions are to be processed concurrently or the dependent instruction is to be processed before the first instruction, an assumption must be made regarding the result produced by the first instruction. The “state” of the processor, as defined at least in part by the content of registers the processor uses for execution of instructions, may change from cycle to cycle. If an assumption used for processing an instruction proves to be incorrect then, of course, the result produced by the processing of the instruction will almost certainly be incorrect, and the processor state must recover to a state with known correct results up to the instruction for which the assumption is made. An instruction for which an assumption has been made is generally referred to as an “interruptible instruction”, and the determination that an assumption is incorrect, triggering the need for the processor state to recover to a prior state, is referred to as an “interruption” or an “interrupt point”. In addition to incorrect assumptions, there are other causes of such interruptions requiring recovery of the processor state. Such an interruption is generally caused by an unusual condition arising in connection with instruction execution, error, or signal external to the processor.
In speculative parallelization systems, also known as thread-level speculation (TLS) or multi-scalar systems, a compiler, runtime system, or programmer may divide the execution of a program among multiple threads, i.e. separately managed sequences of instructions that may execute in parallel with other sequences of instructions (or “threads”), with the expectation that those threads will usually be independent, meaning that no thread will write data that other threads are reading or writing concurrently. Due to the difficulty in statically determining the memory locations that will be accessed by threads at compilation time, this expectation is not always met. The parallel threads may actually make conflicting data accesses. Such parallelization systems use speculative execution to attempt to execute such threads in parallel. It is the responsibility of the system to detect when two speculative threads make conflicting data accesses, and recover from such a mis-speculation.
Each parallel thread corresponds to a segment of the original sequential code, and the parallel threads are therefore ordered with respect to one another according to their sequence in the sequential version of code. It is the responsibility of the system to ensure that the results of a speculative thread are not committed until all prior speculative threads in this sequence are known to be free of conflicts with the committing thread. Once it has been determined that the thread does not conflict with any threads in the prior sequence, and prior threads have committed, that thread may commit.
Systems that support transactional memory typically include a subset of the requirements of a system that supports speculative parallelization. Transactional memory attempts to simplify concurrent or parallel programming by allowing a group of load and store instructions to execute in an atomic manner, i.e. it is guaranteed that either (1) all instructions of the transaction complete successfully or (2) no effects of the instructions of the transactions occur, i.e. the transaction is aborted and any changes made by the execution of the instructions in the transaction are rolled-back. In this way, with atomic transactions, the instructions of the transaction appear to occur all at once in a single instant between invocation and results being generated.
Hardware transactional memory systems may have modifications to the processors, caches, and bus protocols to support transactions or transaction blocks, i.e. groups of instructions that are to be executed atomically as one unit. Software transactional memory provides transactional memory semantics in a software runtime library with minimal hardware support.
Transactional memory systems seek high performance by speculatively executing transactions concurrently and only committing transactions that are non-conflicting. A conflict occurs when two or more concurrent transactions access the same piece of data, e.g. a word, block, object, etc., and at least one access is a write. Transactional memory systems may resolve some conflicts by stalling or aborting one or more transactions.
Transactional blocks are typically demarcated in a program with special transaction begin and end annotations. Transactional blocks may be uniquely identified by a static identifier, e.g., the address of the first instruction in the transactional block. Dynamically, multiple threads can concurrently enter a transactional block, although that transactional block will still share the same static identifier.