1. Field of the Invention
The invention relates to a method for exploiting parallelism at a user level. Specifically, embodiments of the invention include a method for synchronizing the parallel execution of code by multiple threads on multi-core processors.
2. Background
Improvements in processing power in modern computer systems have concentrated primarily on increasing the clock speed and consequently the rate at which instructions can be executed by a processor. However, the rate of improvements in the speed of processors has slowed. This slow down in progress on clock and processing speed is attributed to limits based in semiconductor physics, thermal issues and power issues related to processor design. Much past progress has been accomplished through making increasingly smaller circuits and processor designs. However, this trend does not seem to hold promise in the future due to increasing operating temperatures and power consumption. As a result, increasing attention is being given to methods of improving the parallel processing of instructions, that is, improvements in parallelization.
Parallel processing of instructions is accomplished through the provision of multiple execution units and pipelining in single core processors. Increasing design complexity and negative clock effect have hampered the development of advanced microarchitectural techniques for extracting higher levels of parallelism. Recently, there has been increasing trend of developing hyper-threaded and multi-core processors to enable power-efficient means of parallel execution. This facilitates exploitation of thread-level parallelism by providing support for multiple hardware contexts.
Programmers have attempted to improve parallelization at the software level. However, most commonly used programming languages were not designed to exploit parallelism and do not give the programmer adequate tools for parallelizing software. Some extensions to these programming languages have been developed to improve the set of tools available to programmers. However, these software level tools are limited in their ability to extract parallelism to certain types of code sequences in programs. For example, these software level tools are able to exploit DOALL type parallelism, which are loop structures that do not have dependencies between iterations.
Exploitation of parallelism, including parallelization at the software level, requires that certain critical sections of the code be executed in program order. To ensure the proper order in executing these critical sections of code, synchronization mechanisms have been developed, such as semaphores and monitors at the kernel level and locks at the user level. Kernel-level synchronization incurs a high overhead forcing much synchronization at the user level. This again places the burden on an application programmer who has very basic and limited tools.