As the degree of parallelization of instructions in a computer is increased, more instructions can be executed at the same time. However, it is said that the number of instructions contained in a basic block, that is, the number of instructions from one branch to another branch is about four, so that it is difficult to increase the degree of parallelization of instructions. A system in which the degree of parallelization of instruction is increased to effectively use a computer, for example, is disclosed in IEEE International Symposium Computer Architecture Proceedings, May, 1990. In the boosting system an instruction belonging to a later basic block of a certain basic block is moved to a precedent basic block and an instruction in the precedent basic block and the moved instruction (which is referred to as a boosted instruction hereinafter) are executed in parallel in the order provided by an instruction code, so that the boosted instruction can be executed in advance, validating or invalidating the result of the execution of the boosted instruction in accordance with whether the precedent basic branch is taken or not.
FIGS. 19(a) and 19(b) are graphs of the data dependencies of the conventional boosting system, in which FIG. 19(a) is a graph before boosting and FIG. 19(b) is a graph after boosting. Reference numerals 100a, 100b and 100c each designate a basic block which is a unit from one branch instruction to another branch instruction. The later basic block 100b or 100c is taken in response to a branch instruction of the precedent basic block 100a. FIG. 20 is a view showing architecture of a parallel computer performing the conventional boosting. In FIG. 20, reference numeral 1 designates an instruction memory storing an instruction and reference numeral 2 designates a data memory storing data. Reference numerals 3a and 3b designate a register file and a shadow register file, respectively, which are memories for storing data temporarily and are accessed from the instruction decode stage 5 and the write back stage 8. An instruction fetch stage 4 fetches the instruction from the instruction memory 1. An instruction decode stage 5 decodes the fetched instruction and sends the instruction to an execution stage 6 if the instruction can be executed, that is, issues the instruction. The execution stage 6 performs an address calculation for execution of an operation instruction and memory access. A memory access stage 7 executes a load or a store instruction. A write back stage 8 rewrites a result of the operation and leads data into the register file. Reference numerals 9a and 9b designate a store buffer and a shadow store buffer, respectively, which are memories for temporarily storing addresses and data stored in the data memory 2.
FIG. 21 is a view showing a two-phase clock which provides operation timing of the parallel computer. A single stage operation is performed every cycle shown in FIG. 21.
The conventional boosting is performed on the basis of the following rules. That is, (1) the instruction which can be boosted is a memory access instruction and an operation instruction, (2) it should be clearly shown whether it is the boosted instruction in an instruction code, and (3) boosting is performed from either the later basic block of a taken branch or the later basic block of a not-taken branch.
One characteristic of the hardware for implementing the boosting is duplication of a register file and a store buffer as shown in FIG. 20. Referring to FIG. 20, the conventional hardware comprises a register file 3a, a shadow register file 3b, a store buffer 9a and a shadow store buffer 9b. Thus, the boosted instruction which is invalidated by a result of the branch, that is, the ineffective boosted instruction which should not be executed in fact is executed as an undecided boosted instruction at a stage in which the branch is not yet taken, so that a storage state is not changed.
A change of the storage state by the undecided boosted instruction is not written in the register file and the store buffer but written in the shadow register file and the shadow store buffer. Data written in the shadow side is validated when the direction of branch is decided.
FIG. 22 is a view showing bypass operation of a conventional computer in a pipeline system. In FIG. 22, the same reference numbers as in FIG. 20 designate the same or corresponding parts. In addition, reference numeral 10 designates a bypass selection circuit for controlling a bus 11 for bypassing data of each stage by the execution stage 6.
FIG. 23 is a view showing a circuit in the pipeline. In FIG. 23, reference numeral 21 designates a register file, reference numeral 22 designates an address comparator, reference numerals 23 and 24 designate first and second destination storing registers which store destination addresses of output data of the instructions of the execution stage and the memory access stage, respectively, reference numeral 26 designates an operation executing part, reference numerals 27 and 28 designate first and second data registers which store the operational results of the execution stage and the memory access stage, respectively and reference numeral 25 designates a selector circuit for selecting data.
FIG. 24 is a view showing an example of a circuit in the address comparator 22. Source addresses src1 and src2 applied from the instruction decoder and addresses A1 and A2 of destinations of preceding instructions are input to the address comparator, in which the src1 is compared with the destination addresses A1 and A2 or the src2 is compared with the A1 and A2 and then a signal for controlling the selector circuit, which whether these addresses coincide, is output from it.
FIG. 25 is a view showing an example of a structure of the selector circuit 25. In FIG. 25, reference numerals 18 and 19 designate busses for selecting input data from operating parts. More specifically, a bus (s1-bus) 18 transfers data s1-data to the operation executing part and a bus (s2-bus) 19 transfers data s2-data to the operation executing part. In the selector circuit, in accordance with a control signal, when the addresses coincide, the coincident address data is selected as data output to the buses s1-bus and s2-bus. When all of the control signals for controlling selection of data output to the s1-bus do not coincide, the data datal supplied from the register file 21 is selected. In addition, when all of the control signals for controlling selection of data output to the s2-bus do not coincide, data data2 supplied from the register file 21 is selected.
The source addresses src1 and src2, which are addresses of two reference data from the instruction decoder, and the destination addresses A1 and A2 of the output data of the preceding instructions stored in the first and second destination storing registers 23 and 24 are applied to the address comparator 22 in which these addresses are compared and the control signal is output to the data selector circuit 25. The data1 and data2 output from the register file 21 and data D1 and D2 stored in the first and second data registers 27 and 28 are input to the data selector circuit 25 and data s1-data and s2-data to be input to the operation executing part 26 are selected in accordance with the control signal applied from the address comparator part 22 in the data selector circuit 25. Thus, the data of each stage is bypassed to the instruction stage.
FIG. 26 is a view showing a structure for explaining a score boarding function of a conventional computer of a pipeline system. In FIG. 26, the same reference numbers as in FIGS. 20 and 22 designate the same or corresponding parts. In addition, reference numeral 12 designates a score board, which has a memory for controlling data in the register file and comprises its control circuit.
As shown in FIG. 26, writing into the register file 3 is performed at the end of execution. Therefore, when instruction data whose execution is not yet completed is used by the following instruction, wrong data could be used. In order to avoid this, score boarding is provided in the register file of the pipeline system computer. The writing instruction in the register file puts a mark on the score board (registration) so that the following instruction may not read the register and incorrect data which is not the newest may not be used. When the writing is completed, the mark on the score board is canceled (registration cancel).
FIG. 27 is a view showing a structure of the score board. Operation of the score board will be described hereinafter. A reading address and a writing address of the register are input to the score board from the instruction decode stage. A mark is put on the register in which writing is performed in accordance with the input data so that data in the register may not be used by the following instruction. In addition, whether the register to be read is locked is checked in reading the data of the register and a data correction or incorrection signal is output in accordance with registered and unregistered data. Further, the writing address is received from the write back stage and when the writing is completed, the mark on the score board is erased and then registration is canceled.
According to the structure of the conventional parallel computer described above, since boosting is performed from either a later basic block of a taken branch or a later basic block of a not-taken branch, the number of instructions which can be boosted is few, so that the degree of parallelization of instruction can not be sufficiently improved. In addition, as the number of the operating parts is increased to increase the degree of parallelization of instructions, the size of the hardware becomes larger. The size of the hardware becomes larger because the conventional parallel computer has duplicate register files, store buffers and the like to implement boosting. Consequently, it can not be put in a chip.
In addition, bypass control of a conventional computer in which the boosting is not performed is performed only in accordance with the result of a comparison between the source address and the destination address as described above, so that it can not cope with boosting.
Further, the score board of the conventional computer in which boosting is not performed is structured as described above, so that it can not cope with boosting.