Multithreading is a technique for allowing multiple threads to execute concurrently while sharing resources of a processor. These resources include the processing unit (or core) as well as the registers and other memory internal to the core (e.g. write buffers, L1 cache, etc.). When processing switches from a first thread to a second thread, the operating system will cause the state of the core's internal registers to be stored. This enables the first thread, at a later time, to resume execution where it left off prior to the switch. These context switches can occur very frequently (e.g. at least every 10 milliseconds in response to a periodic timer interrupt and possibly as frequently as every 100-1,000 instructions due to exceptions) thereby giving the appearance that the threads are executed in parallel. In this specification, the term interrupt will be used generally to refer to both interrupts and exceptions as these terms are commonly used in the art.
In a multithreaded architecture, because multiple threads may require access to the same data, it is necessary to implement techniques to ensure the validity of the data accessed and relied upon by each thread. One way to ensure data validity is to implement a locking scheme where one thread obtains exclusive access to a range of data while the thread operates on the data. Although locking schemes are effective to ensure data validity, they can significantly slow the performance of a multithreaded process. For example, if one thread requires access to a range of data locked by another thread, the thread may have to block until the lock is released. Locking therefore increases the amount of serial execution in a multithreaded process.
To address the inefficiencies caused by locking, various types of speculative execution have been proposed. Speculative execution generally refers to the execution of an instruction stream even though it is not known whether the instructions will be executed or, if they will be executed, whether the current state of execution is valid for their execution. One type of speculative execution, known as thread level speculation, is a technique where a thread is allowed to continue execution based on the assumption that data relied upon by the thread will not be changed by any other thread. In other words, thread level speculation is an optimistic approach that assumes that most, if not all, concurrently executing threads will not update the same data.
FIG. 1 illustrates an example architecture of a processor 100 in which thread level speculation can be implemented. Processor 100 is shown as including a single core; however, a multicore processor would have multiple processing cores that are each configured as shown. The internal task state data area represents the portion of an executing thread's state that will be maintained during a context switch. For example, when a context switch occurs, the values stored in the core's general registers will be stored to allow the thread to resume operation at a later time. The storing of these values will occur in any multithreaded implementation whether or not thread level speculation is implemented.
The internal task state data area can be viewed as the source of data that is written to memory during a context switch. In the x86 architecture, for example, the source of the data that is stored includes the general registers (EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI), the segment registers (ES, CS, SS, DS, FS, GS), the flags register (EFLAGS), and the instruction pointer (EIP). The values in these registers are written into a data structure known as the task state segment (TSS). In this specification, the task state data area will refer to either an internal task state data area or an external task state data area. The internal task state data area represents the values within the processor that will be stored during a context switch while the external task state data area represents the actual data structure in memory where these values are stored.
To enable thread level speculation, additional buffers (i.e. the other internal buffers depicted in FIG. 1) are maintained to store the results of the speculative execution. For example, if an executing thread loads a data value that is currently being operated on by another suspended thread, it is unknown whether the suspended thread will eventually update the data value. The executing thread therefore continues execution based on the assumption that the suspended thread will not ultimately update the data value. However, because it is unknown whether the speculative execution will become valid, the results of the speculative execution are temporarily stored in the other internal buffers. If the suspended thread ultimately does not update the data value, the buffered results of the speculative execution can be stored. However, if the suspended thread updates the data value (i.e. if a data collision occurs), the thread can be rolled back to recommence execution using the updated value.
In theory, if few data collisions occur, thread level speculation can increase the performance of a multithreaded process by increasing the amount of parallel processing. In other words, thread level speculation eliminates blocking that would otherwise occur if locks were implemented.
Various implementations of thread level speculation have been proposed. (See, e.g. Martinez, J. F., & Torrellas, J., Speculative Synchronization: Applying Thread-Level Speculation to Explicitly Parallel Applications, Proceedings of the 10th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-X)(ASPLOS '02), 18-29; Rajwar, R., & Goodman, J. R., Speculative Lock Elision: Enabling Highly Concurrent Multithreaded Execution, Proceedings of the 34th International Symposium on Microarchitecture (MICRO '01), 294-305; and Rundberg, P, & Stenström, P., Speculative Lock Reordering: Optimistic Out-of-Order Execution of Critical Sections, Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS '03), 8 pp.).
Although in simulated environments these proposed implementations provided promising improvements in thread level parallelism, each implementation failed to identify the effects of interrupts on the implementation. Referring to FIG. 2, when a context switch occurs (e.g. in response to an interrupt), the internal task state data area is stored to allow the preempted thread to resume processing at a later time. In contrast, the internal buffers which store the results of any speculative execution are flushed. Because the internal buffers are flushed, for these proposed implementations to be successful, each thread involved in speculative execution on shared data must execute to completion without any of the threads being preempted. In an actual implementation, it is likely that at least one of the threads would be preempted prior to completion thereby requiring each thread to be restarted. For this reason, the results shown in these proposed implementations could not be replicated in actual implementations where interrupts would frequently occur.