The present invention relates to computer systems with multiple processors and to multicore processors, and in particular to a method and apparatus of parallelizing programs for execution on such computer systems.
Multicore microprocessors, incorporating multiple processor units, are being used to increase the processing speed of computer systems by allowing parallel execution of a program on multiple processor units. This is in contrast to techniques that increase the processing speed of computer systems by increasing the internal clock rate of an individual processing unit, or techniques that increase the exploitation of instruction-level parallelism within a processing unit.
While it is possible to write a program that is specially designed for parallel execution on a multicore processor, it is clearly desirable to provide a method of parallelizing standard sequential programs. Such a parallelizing method would simplify programming, allow the use of standard programming tools, and permit current programs to execute efficiently on multicore systems.
It is known to parallelize standard sequential programs by exploiting naturally occurring parallel structure that can be found in small groups of instructions. Increased parallelization may be obtained through speculative techniques that execute small groups of instructions that are logically sequential but that may, in practice, be executed in parallel without data dependency or control dependency conflicts. Generally, data dependencies are violated when one concurrent thread executes based on an assumption about data values that are changed by another concurrently executing thread earlier in the control flow. Control dependencies are violated when one concurrent thread executes based on an assumption about the control flow, for example the resolution of a branch statement that is changed by another concurrently executing thread earlier in the control flow.
These problems of data and/or control dependencies substantially limit the number of instructions that can be parallelized by these techniques. As the number of concurrently executing threads increases in an attempt to achieve “distant” parallelism, violations of data and/or control dependencies become more common. Violations of data and/or control dependencies require “squashing” of the thread in violation, a process that can erase gains in execution speed from the parallelization.