Computing devices such as laptops, tablets and/or smart phones generally include a processor, memory and one or more peripheral devices. The processor may include one or more processing units, e.g., core(s), configured to execute one or more software application(s). A process, i.e., an executing software application, may include one or more thread(s). The processor may be configured to execute one or more process(es) and/or thread(s) generally in parallel. The process(es) and/or thread(s) may share the processing unit(s) in a time slice fashion, managed by, for example, a scheduler included in an operating system (OS).
Multithreaded applications take advantage of increasing number of cores to achieve high performance. However, writing multi-threaded applications requires programmers to manage access to data sharing among multiple threads. Access to shared data typically requires synchronization mechanisms. These mechanisms serialize operations on the shared data, often through the use of a critical section protected by a lock.
In computer science, “concurrency” describes the extent to which units of an application can be executed out-of-order or in partial order, without changing the result. Concurrency is desirable, because parallel execution of concurrent units can improve overall speed of execution in multi-processor and multi-core systems.
However, synchronization and serialization can limit concurrency. For example, if a lock is held by thread A, thread B has to wait until thread A releases the lock, even if the two threads access different table entries and have no data conflict. As a result, programmers try to reduce synchronization overhead. Programmers do this by reducing the use of synchronization or by using fine-grain locks, e.g. multiple locks which each protect different shared data. For example, in the example of threads A and B, instead of using a single lock, the application may use multiple locks to synchronize access to different parts of the table. Threads A and B access different table entries, use different locks for their accesses, and therefore, they do not need to wait for one another. However, to develop a multi-threaded application with fine grain locking demands expertise and requires additional effort for debugging, such as to avoid deadlock. This can increase the cost of software development.
Even in the case of fine-grain locks implemented correctly, lock locality can overtake lock granularity and become its own performance bottleneck.
Transactional programming allows programmers to write software and designate processes for speculative or transactional execution. An example is INTEL® TRANSACTIONAL SYNCHRONIZATION EXTENSIONS (“TSX”). TSX allows programmers to write coarse-grained locks; the computer processor dynamically determines which threads can be serialized. However, to take advantage of these features, programmers still need to be trained to specify code regions which are to take advantage of TSX and they still need to write and deploy TSX-enabling code, often across a diverse code base executed by diverse hardware (not just hardware from only one vendor). The resulting burden on programmers slows adoption of transactional programming as a technique to reduce synchronization overhead.
Although the following Detailed Description will proceed with reference being made to illustrative embodiments, many alternatives, modifications, and variations thereof will be apparent to those skilled in the art.