1. Field of the Invention
Embodiments of the present invention facilitate transactional execution in a computer system. More specifically, embodiments of the present invention facilitate transactional execution in a computer system that supports simultaneous speculative threading (SST).
2. Related Art
In order to execute code more efficiently, processors have been designed to support simultaneous speculative threading (SST), in which two or more hardware strands can be used to execute a single software thread. For example, in an SST processor that supports two hardware strands, the processor can use one strand (a “primary strand”) to execute instructions for the software thread as quickly as possible while the second strand (a “subordinate strand”) is idle or is performing other computational work. In order to avoid unnecessary delays, upon encountering a long-latency instruction with an unresolved data dependency (e.g., a load instruction that misses in the L1 cache and must be sent to the L2 cache), the primary strand can defer the instruction by placing the instruction into a deferred queue and can continue executing subsequent instructions. While executing the subsequent instructions, the primary strand can similarly defer instructions that have unresolved dependencies. When data ultimately returns for a deferred instruction, the subordinate strand can make one or more passes through the deferred queue to execute deferred instructions that depend on the returned data, while the primary strand can continue to execute non-deferred instructions.
Some SST systems also support transactional execution (also called “transactional memory”) in which designated sections of code are executed in a transaction. Generally, executing a section of code in a transaction involves ensuring that other threads do not interfere with memory accesses made during the transaction and that the transaction appears to be atomic from the perspective of other threads. Transactional execution is known in the art and hence is not described in more detail.
In some SST systems that support transactional execution, cache line accesses from a thread (i.e., from the strands that are being used to execute the thread) are tracked in the L2 cache using a single strand identifier. Consequently, using the primary strand to execute a transaction while using the subordinate strand to execute deferred instructions can cause errors. These systems therefore execute the transaction using only the primary strand, while the subordinate strand is idle or is executing code that is unrelated to the transaction.
In order to execute a transaction in this way, the strands must reach a consistent state prior to the beginning of the transaction. In some systems, this involves executing an “instruction barrier,” which causes the primary strand to stall until the subordinate strand executes all pre-transactional deferred instructions and all other pre-transactional operations for the thread have been completed (e.g., buffered loads and stores have been committed to the system's architectural state). When the strands reach a consistent state, the system begins executing the transaction using the primary strand. Unfortunately, stalling the primary strand until the subordinate strand reaches a consistent state results in an inefficient use of computational resources.
Hence, what is needed is a processor that supports transactional execution and SST without the above-described problem.