This invention relates to CPUs, such as in minicomputers or microcomputers, and particularly to a data processor suitable for use in high speed operation.
Hitherto, various means have been devised for the high speed operation of computers. The typical one is a pipeline system. The pipeline system does not complete the processing of one instruction before execution of the next instruction is started, but performs the execution of instructions in a bucket-relay manner such that, when the execution of one instruction which is divided into a plurality of stages is going to enter into the second stage, execution of the first stage of the next instruction, which is similarly divided into a plurality of stages, is started. This system is described in detail in the book "ON THE PARALLEL COMPUTER STRUCTURE", written by Shingi Tomita, published by Shokodo, pages 25 to 68. By use of the n-stage pipeline system, it is possible to execute n instructions along all stages at the same time and complete the processing of one instruction at each pipeline pitch with one instruction being processed at each pipeline stage.
It is well known that the instruction architecture of a computer has a large effect on the processing operation and the process performance. From the instruction architecture point of view, the computer can be grouped into the categories of CISC (Complex Instruction Set Computer) and RISC (Reduced Instruction Set Computer). The CISC processes complicated instructions by use of microinstructions, while the RISC treats simple instructions and instead performs high speed computation using hard wired logic control without use of microinstructions. Now, we will describe the summary of the hardware and the pipeline operation of both the conventional CISC and RISC.
FIG. 2 shows the general construction of the CISC-type computer. There are shown a memory interface 200, a program counter (PC) 201, an instruction cache 202, an instruction register 203, an instruction decoder 204, an address calculation control circuit 205, a control storage (CS) 206 in which microinstructions are stored, a microprogram counter (MPC) 207, a microinstruction register 208, a decoder 209, a register MDR (Memory Data Register) 210 which exchanges data with the memory, a register MAR (Memory Address Register) 211 which indicates the operand address in the memory, an address adder 212, a register file 213, and an ALU (Arithmetic Logical Unit) 214.
The operation of the computer will be mentioned briefly. The instruction indicated by the PC 201 is taken out by the instruction cache and supplied through a signal 217 to the instruction register 203 where it is set. The instruction decoder 204 receives the instruction through a signal 218 and sets the head address of the microinstruction through a signal 220 in the microprogram counter 207. The address calculation control circuit 205 is ordered through a signal 219 to process the way to calculate the address. The address calculation control circuit 205 reads the register necessary for the address calculation, and controls the address adder 212. The contents of the register necessary for the address calculation are supplied from the register file 213 through buses 226, 227 to the address adder 212. On the other hand, a microinstruction is read from the CS 206 at every machine cycle, and is decoded by the decoder 209 and used to control the ALU 214 and the register file 213. In this case, a control signal 224 is supplied thereto. The ALU 214 calculates data fed from the register through buses 228, 229, and again stores it in the register file 213 through a bus 230. The memory interface 200 is the circuit used for exchanging data with the memory, such as fetching of instructions and operands.
The pipeline operation of the computer shown in FIG. 2 will be described with reference to FIGS. 3, 4 and 5. The pipeline is formed of six stages. At the IF (Instruction Fetch) stage, an instruction is read by the instruction cache 202 and set in the instruction register 203. At the D (Decode) stage, the instruction decoder 204 performs decoding of the instruction. At the A (Address) stage, the address adder 212 carries out the calculation of the address of the operand. At the OF (Operand Fetch) stage, the operand of the address pointed to by the MAR 211 is fetched through the memory interface 200 and set in the MDR 210. At the EX (Execution) stage, data is read by the register file 213 and the MDR 210, and fed to the ALU 214 where it is calculated. At the last W (Write) stage, the calculation result is stored through the bus 230 in one register of the register file 213.
FIG. 3 shows the continuous processing of add instruction ADDs as one basic instruction. At each machine cycle, one instruction is processed, and the ALU 214 and address adder 212 operate in parallel.
FIG. 4 shows the processing of the conditional branch instruction BRAcc. A flag is produced by the TEST instruction. FIG. 4 shows the flow at the time when the condition is met. Since the flag is produced at the EX stage, three-cycles of waiting time are necessary until the jumped-to-instruction is fetched, and greater the number of stages, the greater will be the waiting cycle count, resulting in a bottleneck in the performance enhancement.
FIG. 5 shows the execution flow of a complicated instruction. The instruction 1 is the complicated instruction. The complicated instruction requires a great number of memory accesses as in the string copy and is normally processed by extending the EX stage many times. The EX stage is controlled by the microprogram. The microprogram is accessed once per machine cycle. In other words, the complicated instruction is processed by reading the microprogram a plurality of times. At this time, since one instruction is processed at the EX stage, the next instruction (the instruction 2 shown in FIG. 5) is required to wait. In such case, the ALU 214 operates at all times, and the address adder 212 idles.
The RISC-type computer will hereinafter be described. FIG. 6 shows the general construction of the RISC-type computer. There are shown a memory interface 601, a program counter 602, an instruction cache 603, a sequencer 604, an instruction register 605, a decoder 606, a register file 607, an ALU 608, an MDR 609, and an MAR 610.
FIG. 7 shows the process flow for the basic instructions. At the IF (Instruction Fetch) stage, the instruction pointed to by the program counter 602 is read by the instruction cache and set in the instruction register 605. The sequencer 604 controls the program counter 602 in response to an instruction signal 615 and a flag signal 616 from the ALU 608. At the R (Read) stage, the contents of the instruction pointer register is transferred through buses 618, 619 to the ALU 608. At the E (Execution) stage, the ALU 608 performs an arithmetic operation. Finally at the W (Write) stage, the calculated result is stored in the register file 607 through a bus 620.
In the RISC-type computer, the instruction is limited only to the basic instruction. The arithmetic operation is made only between the registers, and the instruction including operand fetch is limited to the load instruction and the store instruction. The complicated instruction can be realized by a combination of basic instructions. Without use of the microinstruction, the contents of the instruction register 605 are decoded directly by the decoder 606 and used to control the ALU 608 and so on.
FIG. 7 shows the process flow for a register-to-registers arithmetic operation. The pipeline is formed of four stages since the instruction is simple.
FIG. 8 shows the process flow at the time of a conditional branch. As compared with the CISC-type computer, the number of pipeline stages is small, and thus the waiting cycle time is only one cycle. In this case, in addition to the inter-register operation, it is necessary to load the operand from the memory and store the operand in the memory. In the CISC-type computer, the loading of the operand from the memory can be performed in one machine cycle because of the presence of the address adder, while in the RISC-type computer shown in FIG. 6, the load instruction requires two machine cycles because it is decomposed into an address calculation instruction and a load instruction.
The problems with the above-mentioned prior art will be described briefly. In the CISC-type computer, although the memory-register instruction can be executed in one machine cycle because of the presence of the address adder, the overhead at the time of branching is large because of the large number of pipeline stages. Moreover, only the E stage is repeated when a complicated instruction is executed, and, as a result, the address adder adles.
In the RISC-type computer, the overhead at the time of branching is small because of the small number of pipeline stages. However, for the memory-register operation without use of an address adder, two instructions are required, including the load instruction and the interregister operation instruction.