The present invention relates to methods and apparatus for executing a sequential computer program “in parallel” on multiple processors and in particular to a technique in which a distilled version of the program is used to coordinate the parallel execution.
Faster computer processing can be obtained with faster processors (e.g., processors having higher clock rates, larger data words, or more powerful instruction sets) or with more processors by dividing the processing task among a number of processors. This latter technique is termed parallel processing.
Programs can be explicitly written as parallel programs (also called multithreaded programs), but this is often more difficult than writing a sequential program with the same functionality. Also, sequential programs can be automatically converted into parallel programs by parallelizing compilers, but these techniques are currently limited to a small class of applications.
Two previous speculative parallel processing models include the multi-scalar model and the pre-execution model. In the multi-scalar model, the program to be executed is broken, to the extent possible, into independent tasks which are each assigned to a different processor. To the extent that the tasks are not truly independent, control information or data information must be exchanged between the tasks. When information needed by one task is generated by another task, the first task must stall and wait for the second task to complete. The problem of stalling can significantly limit the efficacy of the multiscalar model.
One approach to minimize stalling is to allow the task needing information to speculate as to what information it will receive, picking a data value or control path to continue execution. When the data or control information arrives, the speculation may be verified and if incorrect, the speculative execution may be “squashed” and the program “rewound” to the point of speculation and the correct data used. Nevertheless, so long as the prediction can achieve a certain accuracy, speculation provides speed advantage.
In the pre-execution model, the program is scanned ahead of its execution point on a first processor for problem areas that may slow the execution, for example, LOAD instructions accessing data outside the cache or unresolved BRANCH instructions. A second processor is assigned to these problem areas to pre-execute them. Again, speculation may be used when values required for the pre-execution are not immediately available.