1. Field of the Invention
The present invention generally relates to a processor and more particularly, to a controlling system designed to control parallel-processing of instructions in a processor provided with a plurality of operational pipelines.
2. Description of the Prior Art
In order to keep consistency of programs in a processor having a plurality of processing sections for parallel processing of instructions, it is necessary to process the instructions while Securing the data dependency therebetween as described below.
(1) Flow Dependency
In the case of a program including two instructions as follows;
fmul fr00 fr01 fr02
fadd fr03 fr00 fr04,
the result obtained by execution of an instruction fmul is stored in fr00, and is used to process the next instruction fadd. Therefore, the instruction fadd cannot to be processed until the result of the instruction fmul is output. Such a dependency of data is called a flow dependency. In this case, the instructions fmul and fadd are respectively called a preceding instruction and a succeeding instruction.
(2) Inverse Dependency
In the case of a program including two strings of instructions as follows;
fmul fr01 fr00 fr02
fadd fr00 fr03 fr04,
the preceding instruction fmul uses the data of fr00 as input data, and-thereafter, the succeeding instruction fadd stores the result in fr00. Therefore, the data obtained by the succeeding instruction cannot be written into the register until the data stored in the register is completely read out therefrom for executing the preceding instruction. This relationship of data is called an inverse dependency.
(3) Output Dependency
In the case of a program including two instructions as follows;
fmul fr00 fr01 fr02
fadd fr00 fr03 fr04,
the result obtained by executing the succeeding instruction fadd should be stored in fr00. Therefore, the data obtained by executing the succeeding instruction cannot be written into the register before the data obtained by executing the preceding instruction is written into the register. This is called an output dependency.
Hereinbelow, two prior art parallel processing arrangements with the data dependencies as noted above will be depicted.
One prior art arrangement is the scoreboard algorithm of Thornton used in the CDC6600 machine of Control Data Inc. which is described in detail in "Instruction Issue Logic in Pipelined Supercomputers" by SHLOMO WEISSE & JAMES E. SMITH or "6.7 Advanced Pipelining-Dynamic Scheduling in pipelines" in "Computer Architecture A Quantitative Approach" by John L. Hennessy & David A. Patterson.
According to the scoreboard algorithm, the issuance of an instruction is controlled using a tag which is attached to a register and called a scoreboard. In other words, a tag is set for a register for which the writing of data is reserved, and the tag is reset when the writing is finished. In the case where an instruction having the flow dependency is detected in the register for which the tag is set, the issuance of the instruction of the flow dependency alone is controlled until the data is written into the register, and therefore the other instructions can be sequentially issued. If an instruction having the inverse dependency or output dependency is detected, the issuance of all of the instructions is prohibited until the detected dependency is solved.
The other prior art arrangement is the algorithm of Tomasulo employed in system 360/Mode 91 of IBM Corp. which is also revealed in the aforementioned texts or discussed in more detail "An Efficient Algorithm for Exploiting Multiple Units" by R. M. Tomasulo.
It is characteristic of Tomasulo's algorithm that each operating unit is equipped with a waiting buffer called a reservation station to wait for input data, and feeds its output to a common bypass bus called a common data bus. Each waiting buffer stores tag information such as the name of a register, etc. of the read input data or not-yet-read data and therefore, even an instruction with data dependency can be fed to the waiting buffer. The inverse dependency and output dependency can be solved without restricting the issuance of an instruction if the name of a register in the waiting buffer is changed. When the flow dependency is detected, the succeeding instruction is kept waiting in the waiting buffer, as in the above scoreboard algorithm, until the preceding instruction is completely executed. However, since the result obtained by the preceding instruction is output to the common bus, it is not necessary to wait for the result to be written into the register, but processing of the succeeding instruction can be started by bypassing the data on the common bus.
As is discussed hereinbelow, according to Thornton's algorithm, the issuance of all of the instructions is prohibited when an inverse dependency or output dependency is detected, whereby the processing efficiency is disadvantageously deteriorated. Moreover, since the succeeding instruction cannot be issued until the preceding instruction has completed the writing of the data, this also lowers the processing efficiency.
Meanwhile, according to Tomasulo's algorithm, the hardware results in a bulky structure because of the installation of waiting buffers. It is further problematic that it is difficult for the waiting buffers to be controlled and the output data on the common bus might compete with each other to decrease the efficiency.