The present invention relates in general to a data processor and, more particularly, to a pipeline processor which permits the overlapping of fetch and execute cycles.
Recently, the performance of computer systems has remarkably advanced. The data processing speed of the individual computer system largely depends upon the data processor which functions as a central part of this system, and a high speed data processor is still being developed by a number of computer designers and engineers.
There is, for example, a pipeline processor which is the known high speed data processor. This pipeline processor has an instruction fetching circuit, an instruction register, an execution control circuit and an execution circuit. These circuits simultaneously perform operations required to execute a plurality of instructions stored in a memory. In other words, in the pipeline processor, the instructions are fetched by the instruction fetching circuit in a first machine cycle, and then are stored in the instruction register in a second machine cycle. The execution control circuit decodes an OP-code of the instruction in the instruction register, and controls the operation of the execution circuit which includes, for example, an accumulator register, data buses, an ALU, and transfer gates in response to the OP-code. In contrast, the instruction fetching circuit receives a signal which respectively controls the fetch timings of a plurality of instructions from the execution control circuit. Thus, when the first instruction is, for example, fetched from the memory prior to the second instruction, the second instruction is fetched from the memory in the machine cycle before the execution of the first instruction has been completed. In other words, the execution of the second instruction can be started from the machine cycle immediately after the execution of the first instruction has been completed.
The above-described pipeline processor executes the instruction to subject the contents of an accumulator register to an arithmetic or logic operation (e.g., addition, subtraction, OR operation, AND operation) at an ALU, and to store the processed data in the accumulator register as described below.
FIGS. 1A to 1E respectively show first and second clock signals, and first through third control signals for respectively controlling the conduction of the first through third switch circuits, each of which contains a plurality of transfer gates. The first through third control signals are generated from the execution control circuit in synchronization with the first and second clock signals. Data in the accumulator register is supplied to an ALU through the first switch circuit and the first data bus during the time from T1 to T4. The data output by the ALU is held by a temporary latch during the time from T4 to T5 and supplied from the temporary latch to the second data bus thorugh the second switch circuit during the time from T5 to T8. The data on the second data bus is supplied to the accumulator register during the time from T6 to T7. Here, the ALU processes the data input during the time from T2 to T3, and outputs the processed data until the time T6. Further, the first and second data buses are precharged every time the second clock signal shown in FIG. 1B becomes at a high level. Incidentally, the period from a leading edge to the next leading edge of the first clock signal shown in FIG. 1A corresponds to oen machine cycle.
More specifically, four consecutive machine cycles are necessary to execute an instruction, such as an increment instruction, for processing the data from the accumulator register at the ALU and to update the content of the accumulator register.
FIG. 2A shows several machine cycles of a pipeline processor, and FIGS. 2B to 2E respectively show the periods of a fetching operation for executing a plurality of increment instructions INC1, INC2 . . . ; decoding operation; data processing operation; and updating operation of the contents in accumulator register.
The instruction INCl is fetched in machine cycle 1, and decoded in machine cycle 2. The content (A) of the accumulator register is set to be (A) +1 in accordance with the instruction INCl in machine cycle 3, and the accumulator register stores the processed data (A)+1 in machine cycle 4. On the other hand, the instruction INC2 is fetched in machine cycle 3, and decoded in machine cycle 4. The content (A)+1 of the updated accumulator register is added to 1 in accordance with the instruction INC2 in machine cycle 5, and the accumulator register stores the processed data (A)+2 in machine cycle 6. The instruction INC3 is fetched in machine cycle 5, and is decoded in machine cycle 6. (INC3, INC4 . . . are later processed in the same manner as described above.)
In the conventional pipeline processor, the hatched parts sown in FIGS. 2B to 2E are blank periods in which no operation is performed. If the instruction INC2 is fetched in the machine cycle 2, the blank period can be eliminated. In this case, if the processed data (A)+1 produced by the execution of the instruction INCl is first stored in the accumulator register in machine cycle 4, the data (A)+1 should then be input from the accumulator register to the ALU in order to execute the INC2 in the same machine cycle 4. The first control signal is at a high level during the time from T5 to T8 as shown by the broken line in FIG. 1C.
When the charging state of the first data bus during the time from T5 to T8 is observed, the precharge of the first data bus is completed before the time T5. The first switch circuit between the first data bus and the accumulator register becomes conductive upon the rising of the first control signal at the time T5 to transfer the data in the accumulator register to the first data bus. In other words, the first data bus is discharged from the time T5 in accordance with the contents of the accumulator register. However, the accumulator register still holds the data (A) at this time, and the processed data (A)+1 is not stored in the accumulator register until the time T6. Therefore, the precharge of the first data bus becomes invalid by the data (A) before the time T6 for the ALU to receive the data (A)+1, and the processor as a result erroneously operates.