Advances in semi-conductor processing and logic design have permitted an increase in the amount of logic that may be present on integrated circuit devices. As a result, computer system configurations have evolved from a single or multiple integrated circuits in a system to multiple processing cores and multiple logical processors present on individual integrated circuits. A processor or integrated circuit typically comprises a single processor die, where the processor die may include any number of cores or logical processors.
The ever increasing number of cores and logical processors on integrated circuits enables more software threads to be concurrently executed. However, the increase in the number of software threads that may be executed simultaneously have created problems with synchronizing data shared among the software threads. One common solution to accessing shared data in multiple core or multiple logical processor systems comprises the use of locks to guarantee mutual exclusion across multiple accesses to shared data. However, the ever increasing ability to execute multiple software threads creates a bottleneck on the locked data, causing threads to wait for other threads to complete (serializing their execution), reducing the benefit of having multiple threads executing concurrently. Furthermore, some read-only accesses may use a lock to guarantee mutual exclusion of the data in case a writer attempts to modify the data, which has an undesirable side effect of locking out other read-only accesses.
For example, consider a hash table holding shared data. With a lock system, a programmer may lock the entire hash table, allowing one thread to access the entire hash table. However, throughput and performance of other threads is potentially adversely affected, as they are unable to access any entries in the hash table, until the lock is released. Alternatively, each entry in the hash table may be locked, leading to many locks structures in the software. In such a construct, many locks might need to be acquired to execute a particular task, which may lead to deadlocks with other threads. Either way, after extrapolating this simple example into a large scalable program, it is apparent that the complexity of lock contention, serialization, fine-grain synchronization, and deadlock avoidance become extremely cumbersome burdens for programmers.
Another recent data synchronization technique includes the use of transactional memory (TM). Often transactional execution includes executing a grouping of a plurality of micro-operations, operations, or instructions atomically. In the example above, both threads execute within the hash table, and their memory accesses are monitored/tracked. If both threads access/alter the same entry, conflict resolution may be performed to ensure data validity. One type of transactional execution includes Software Transactional Memory (STM), where tracking of memory accesses, conflict resolution, abort tasks, and other transactional tasks are performed in software, often without the support of hardware. Another type of transactional execution includes a Hardware Transactional Memory (HTM) System, where hardware is included to support access tracking, conflict resolution, and other transactional tasks.
A technique similar to transactional memory includes hardware lock elision (HLE), where a locked critical section is executed tentatively without the locks. And if the execution is successful (i.e. no conflicts), then the results are made globally visible. In other words, the critical section is executed like a transaction with the lock instructions from the critical section being elided, instead of executing an atomically defined transaction. As a result, in the example above, instead of replacing the hash table execution with a transaction, the critical section defined by the lock instructions are executed tentatively. Multiple threads similarly execute within the hash table, and their accesses are monitored/tracked. If any of the threads access/alter the same entry, conflict resolution may be performed to ensure data validity. But if no conflicts are detected, the updates to the hash table are atomically committed.
As can be seen, transactional execution and lock elision have the potential to provide better performance among multiple threads. However, HLE and TM are relatively new fields of study with regards to microprocessors. And as a result, HLE and TM implementations in processors have not bee fully explored or detailed.