1. Field of the Invention
This invention relates to computer systems, and more particularly, to automatic efficient parallelization of code combined with hardware transactional memory support.
2. Description of the Relevant Art
The performance of computer systems is dependent on both hardware and software. In order to increase the throughput of computing systems, the parallelization of tasks is utilized as much as possible. To this end, compilers extract parallelized tasks from program code and many modern processor designs have deep pipelines with multiple cores configured to perform simultaneous multi-threading. However, with multi-core chips and multi-threaded applications, it becomes more difficult to synchronize concurrent accesses to shared memory by multiple threads. This makes it more difficult to ensure that the right operations are taking place at the right time, without interference or disruption, at high performance. The net result is that applications written for multi-processing workloads are currently not achieving the theoretical peak performance of the system.
Locking mechanisms on shared memory is one aspect of software design that disallows peak performance of a system. In place of locking mechanisms, transactional memory improves performance by allowing, in one embodiment, a thread to complete read and write operations to shared memory without regard for operations of other threads. Generally speaking, a transaction may comprise a sequence of operations that perform read and/or write operations to shared memory. These read and write operations may logically occur at a single instant in time. Accordingly, the whole sequence of instructions may occur in an atomic manner, such that intermediate states are not visible to other transactions.
In various embodiments, a division of work may be a software process consisting of one or more threads or a transaction consisting of one or more processes. Taking a thread as an example, with transactional memory, each thread records each of its read and write operations in a log. In one embodiment, when an entire transaction completes, validation may occur that checks other outside threads and transactions have not concurrently modified its accessed memory locations. In an alternative embodiment, validation may occur upon the completion of each memory access in order to verify other transactions have not concurrently modified its accessed memory locations. Once successful validation occurs, the transaction performs a commit operation. If validation is unsuccessful, the transaction aborts, causing all of its prior operations to be rolled back. Then re-execution occurs until the transaction succeeds.
Transactional memory has recently received significant attention from researchers as a promising way to ease the development of correct, efficient and scalable concurrent programs, which would further the throughput of systems with further parallelization of tasks. Transactional memory may be used to support explicit transactional programming styles, as well as to improve the performance and scalability of traditional lock-based programs and other synchronization mechanisms. Transactional memory may be implemented entirely in software. However, software techniques involve significant overhead and thus, incorporate a performance penalty and scalability limits. Proposals for hardware transactional memory are very complex due to ensuring correct interaction with various difficult events such as exceptions, interrupts, and context switches. Modern attempts of designing hardware transactional memory support within a processor may be simplified by guaranteeing support only for transactions of a predetermined size limit, transactions of a predetermined duration limit, transactions that do not include predetermined difficult instructions, transactions that do not exceed on-chip hardware resources, other, or a combination thereof.
Traditional lock-based synchronization mechanisms comprise lock and unlock primitives that may require hundreds of clock cycles to complete. Furthermore, it is very difficult for the user to modify existing code with locks and use hardware transactional memory support. Software locks specified by a software programmer and hardware transactions specified by a compiler do not block one another from shared resources such as a shared data structure.
In view of the above, efficient methods and mechanisms for automatic efficient parallelization of code combined with hardware transactional memory support are desired.