1. Field of the Invention
This invention generally relates to an incremental method of distributing the instructions of an execution sequence among a plurality of processing elements for execution in parallel. More particularly it relates to such a method in which the distribution is based upon the anticipated availability times of the needed input values for each instruction as well as the anticipated availability times of each processing element for handling each instruction. This invention also relates to a computer system and method in which execution sequences of instructions are executed in two modes of execution, the first mode being used not only to execute instructions but also simultaneously to parallelize instruction sequences which have not already been parallelized, while the second mode is used to execute in parallel, on separate processing elements, instruction sequences which have been already parallelized.
2. Description of the Prior Art
One way of executing a digital computer program faster is to execute several of its parts in parallel on separate processors. One way of doing this is to define a programming environment and computer system so that programs can be written for execution in such a parallel fashion. Unfortunately, many useful programs have been created already in which sequential execution of the instructions has been assumed. It is desirable to be able to execute these sequential programs faster also, so some effort has been made in the prior art to parallelize such programs for execution in parallel.
Most of the prior work in this area relies on creating a parallel specification of the program. This has been accomplished in several ways. Sophisticated compilers have been created which parallelize programs and generate code for a multi-processor system having a number of conventional processors. Some of these compilers uncover the parallelism automatically (e.g., "Advanced Compiler Optimizations for Supercomputers" by D. A. Padua and M. J. Wolfe in Comm. of ACM, Vol. 29, page 12 et seq., December 1986). Others take cues from programmer-supplied annotations (e.g., "Programming for Parallelism" by Alan H. Karp in Computer, Vol. 20, Page 5 et seq., May 1987). Another approach is to create specialized hardware that is amenable for parallel execution, such as vector processors, vliw architectures, etc. Here again a compiler translates sequential programs into code suitable for use on these machines. The compiling effort in these cases is substantial. A more radical approach has been to create an inherently parallel execution mechanism, such as a dataflow machine (See "Dataflow Supercomputers" by J. B. Dennis in Computer, Vol. 13, page 11 et seq., November 1980), and a declarative specification for a program which automatically generates parallel code for use on that mechanism (See "Future Scientific Programming on Parallel Machines" by Arvind and K. Ekanadham in the Jour. of Parallel & Distributed Computing, Vol. 5, December 1988).
In all of the foregoing approaches, the task of parallelizing the computer program and determining that it is safe to execute different parts in parallel is done either at the compiler level or even earlier at the programming level (i.e., ahead of any actual productive execution of the code with data). The processors play no role in determining whether it is safe to execute different parts in parallel at execution time because this determination has been made already by either the programmer or the compiler.
Another approach brings unparallelized code to a multi-processor system itself at execution time and gives the multi-processor system an active role in splitting up the code for parallel execution and in determining whether the parallel execution of the code is valid. This approach may be distinguished from the others in that execution of at least some of the instructions is done provisionally. It is not generally known ahead of execution whether the parallel execution is totally valid. A mechanism is provided for determining whether the parts executed in parallel are valid and if not the invalid parts are executed again.
This approach is exemplified in a patent application Ser. No. 342,494 entitle "Multiple Sequence Processor System" filed on Apr. 24, 1989, now abandoned by the assignee of this patent application, in which instructions are divided into groups in accordance with some delimiting rule and then at least two groups are executed in parallel. One of the groups of instructions is sequentially earlier than all of the others and a correct execution of the earliest group is assumed, while the later groups are only provisionally executed in parallel. Later groups of instructions read data from registers and memory locations just as if earlier groups of instructions have already been executed. Controls monitor whether any data used by a later group of instructions is changed (after it has been used by the later group) by instructions in an earlier group. Stores to memory locations and registers by the later groups are done only temporarily in a separate place. If all of the data used by a later group is valid (i.e. not changed by an earlier group), the results of that later group are valid and can become committed. If not, that later group is re-executed.
In U.S. Pat. No. 4,825,360 a similar scheme is used in that instruction groups are being provisionally executed in parallel and then confirmed in sequence. However, in this scheme the chances for success have been enhanced through a compilation step and through a reduction (and preferable elimination) in side effecting instructions other than as the final instruction in a group. As a consequence, it is not clear that this system can be used to parallelize conventional sequential code.
In U.S. Pat. No. 4,903,196 (Pomerene et al.), a uniprocessor parallelizes code for execution on separate asynchronous execution units and the execution units wait for each other, if necessary, to avoid using data which will be modified by instructions earlier in conceptual order until those instructions have been executed. There is only one set of general purpose registers (GPRs) and only one decoder. A series of special purpose tags are associated with each GPR and execution unit in the uniprocessor. The tags allow the multiple execution units to be concurrently executing multiple instructions using the GPRs sequentially or different GPRs concurrently while at the same time preserving the logical integrity of the data supplied by the GPRs to the execution units. The tags associated with each GPR and each execution unit store a sequence trail between the individual GPRs and execution units so that before a given execution unit is permitted to store into a particular GPR, the immediately preceding store into that particular GPR by a different execution unit must have been completed. Also, the tags assure that all reads from a given GPR by one or more execution units are completed before a subsequent store operation to that GPR is allowed to occur.