1. Field of the Invention
The present invention relates to a microprocessor, and more particularly to a pipeline controlled microprocessor.
2. Description of the Related Art
This kind of a conventional pipeline controlled microprocessor has been used in order to improve the throughput for a machine instruction during a given period. For example, FIG. 6 is a block diagram showing a first example of the conventional pipeline controlled microprocessor.
The conventional pipeline controlled microprocessor of the first example comprises a register file 408 having two read ports and holding the data necessary for the internal arithmetic processing of a machine instruction, an ALU 411 and a shifter 414 constituting an arithmetic circuit for performing an arithmetic and logic operation by the use of the data held in the register file 408, a plurality of timing registers 406, 407, 412, 413, 415, 416, 417, 419 and 424 for holding the parallel processed data in the respective stages into which the process of the machine instruction is divided.
The microprocessor further comprises an instruction cache 401, an instruction aligner 402, a program counter 403, an incremental counter 404, an address selector 405, bypass selectors 409 and 410, arithmetic output selectors 418 and 423, a data address register 420, a data cache 421, a data aligner 422 and a write buffer 425.
The microprocessor of the first example divides the process of a machine instruction into the following five stages; IF (Instruction Fetch), RF (Register Fetch), EX (Execute), DF (Data Fetch) and WB (Write Back), and parallely processes the movement as follows in the respective stages.
In the stage of IF, the microprocessor reads a machine instruction from the instruction cache 401 through the instruction aligner 402, and holds the data in the timing register 406.
In the stage of RF, the microprocessor decodes the machine instruction read out therefrom, decides the type of the operation, reads out the data within the register operand-designated by the instruction from the register file 408, and holds the data in the timing registers 412 and 413, or the timing registers from 415 to 417.
In the stage of EX, the microprocessor executes the operation decided by the machine instruction by the use of the read data within the register, and holds the result in the timing register 419 or writes the operation result into the program counter 403 as an address, thereby performing an instruction branch.
In the stage of DF, the microprocessor writes the data within the register file 408 into the data cache 421 by the use of the operation result, reads out the data within the data cache 421 and holds the data in the timing register 424, or holds the operation result held in the timing register 419, in the other timing register.
In the stage of WB, the microprocessor writes into the register file 408 the data read out from the data cache 421 or the data within the timing register 424 holding the operation result.
The case of executing the following machine instruction string will be explained as a concrete example.
______________________________________ xor r1, r4 ; exclusive OR shld3 r2, r3, r4 ; connection and left shift add r6, r7 ; addition ______________________________________
The first instruction `xor` is an instruction so as to execute the exclusive OR operation for a first operand and a second operand and store the result into the second operand. The second instruction `shld3` is an instruction of three operands so as to connect the second operand and a third operand, shift it to the left for the shift number designated by the lower five bits of the first operand, and store the resultant upper 32 bits into the third operand. The third instruction `add` is an instruction so as to execute the addition of the first operand and the second operand and store the result into the second operand.
FIG. 9 is a timing chart showing the movement of the microprocessor executing the machine instruction string described above.
At first, the microprocessor reads the first instruction `xor` from the instruction cache 401 through the instruction aligner 402, and holds the data in the timing register 406.
In the next cycle, the microprocessor decodes the first instruction `xor` read out therefrom, reads out the data within the register r1 and the register r4 designated as the first operand and the second operand by the instruction from the register file 408, and holds each data in the respective timing registers 412 and 413 providing the ALU 411 with the data through the respective bypass selectors 409 and 410. Simultaneously, the microprocessor reads the second instruction `shld3` and holds the data in the timing register 406.
In the next cycle, the microprocessor executes the exclusive OR operation of the first instruction `xor` by means of the ALU 411 and holds the operation result in the timing register 419. Simultaneously, it decodes the second instruction `shld3`, reads out the data within the register r2 and the register r3 designated as the first operand and the second operand by the instruction from the register file 408, and holds each data in the respective timing registers 415 and 416 providing the shifter 414 with the data through the respective bypass selectors 409 and 410.
However, the third operand designated by the instruction `shld3` is not prepared in this cycle because of only two sets of read ports in the register file 408 and bypass selectors provided there. Accordingly, shift operation can't be performed in the next cycle, so that the third instruction `add` read into the timing register 406 won't be performed in the next cycle.
In the next cycle, the operation result of the first instruction `xor` held in the timing register 419 for adjusting the timing to write the data into the register r4 is further held in the other timing register 424 as well as in the timing register 417 providing the shifter 414 with the third input data. The third instruction `add` read into the timing register 406 is reserved in this cycle.
In the next cycle, the operation result of the first instruction `xor` held in the timing register 424 is written into the register r4 of the register file 408. The microprocessor executes the operation of the second instruction `shld3` and holds the operation result in the timing register 419. It further decodes the third instruction `add`, reads out the data within the registers r6 and r7 designated as the first operand and the second operand by the instruction from the register file 408, and holds each data in the respective timing registers 412 and 413 providing the ALU 411 with the data.
As illustrated in FIG. 9, delay for one cycle in the instruction operation occurs in the pipeline controlled process of the machine instruction string in the respective cycles, because of the incapacity to prepare the third operand at the same time as the other operands when executing the connection and left shift operation explained as the second instruction `shld3`, which is called a structure hazard. The presence of time slot killed by the structure hazard is shown in FIG. 7. Such structure hazard similarly occurs in the instructions requiring three input operands like the instruction `shld3`.
FIG. 7 is a block diagram showing a second example of the conventional pipeline controlled 32 bits microprocessor.
The conventional pipeline controlled microprocessor of the second example comprises a register file 508 having three read ports and holding the data necessary for the internal arithmetic processing of a machine instruction, an ALU 512 and a shifter 515 constituting an arithmetic circuit for performing an arithmetic and logic operation by the use of the data held in the register file 508, a plurality of timing registers 506, 507, 513, 514, 516, 517, 518, 520 and 525 for holding the parallel processed data in the respective stages into which the operation process of the machine instruction is divided, and a bypass selector 511 for selecting the data to be held in the timing register 518.
Other blocks are the same as those of the conventional microprocessor of the first example shown in FIG. 6, and hence the explanation thereof is omitted.
When the conventional microprocessor of the second example similarly executes the machine instruction string which has been used for explaining the movement of the conventional microprocessor of the first example, such a structure hazard that occurs in executing the instruction `shld3` by the conventional microprocessor of the first example doesn't occur in the second example. This is why the register file 508 has three read ports capable of simultaneously providing the shifter 515 with three operands necessary for the internal arithmetic processing of the instruction `shld3`.
The register file 508 having three read ports, however, requires a chip area about one and a half times larger than the register file 408 having two read ports indicated in FIG. 6.
Hereinafter, a parallel arithmetic processing unit disclosed in the Japanese Unexamined Patent Publication (Kokai) No. Heisei 5-88893 will be briefly described as a third example of the prior art with reference to FIG. 8.
The parallel arithmetic processing unit comprises arithmetic means 601 and 602, a data storage means 603, a judging means 604, bypass means 605 and 606, and an arithmetic control means 607.
When there is dependency between arithmetic instructions, the arithmetic control means 607 selects an arithmetic means 601 for executing a depended arithmetic instruction and an arithmetic means 602 for executing a depending arithmetic instruction according to the judgment by the judging means 604, and outputs the input data corresponding to the respective arithmetic means 601 and 602 from the data storage means 603. Simultaneously, the output data in the arithmetic means 601 executing the depended arithmetic instruction is directly output from the data storage means 603 through the bypass means 605 as an input data for the arithmetic means 602 executing a depending arithmetic instruction, thereby to prevent the data hazard. When there is no dependency between the arithmetic instructions, the arithmetic control means 607 selects the arithmetic means 601 or 602 corresponding to each arithmetic instruction, and outputs the input data corresponding to the arithmetic means 601 or 602 from the data storage means 603.
When the arithmetic means 601 or 602 is a shifter 414 as indicated in FIG. 6 of the first example, a structure hazard occurs due to queuing of the third operand because there is no means to provide three input operands at a time as designated by the three operand instruction.
When a register file holding the data necessary for the internal arithmetic processing of a machine instruction has two read ports as explained in the 32 bits microprocessor of the first example, a machine instruction requiring two or less input operands could be executed efficiently, without disturbing the pipeline control. However, a machine instruction requiring three input operands demands extra time for preparing the third operand, causing a structure hazard such as to disturb the pipeline control, which results in deteriorating the instruction throughput of the microprocessor. The like problem occurs in the parallel arithmetic processing unit of the third example.
When a register file holding the data necessary for the internal arithmetic processing of a machine instruction has three read ports as explained in the 32 bits microprocessor of the second example, a machine instruction requiring three input operands causes no structure hazard such as to disturb the pipeline control because of the capability to provide three operands at a time. However, a register file having three read ports needs a register file chip area about one and a half times larger than a register file having two read ports. This results in increasing the chip area of the microprocessor and decreasing the yielding per a semiconductor wafer.
In the case that one read port is provided in the register file, because the number of a transistor increases with 2 pcs per one bit, the number of transistors needed for a general register files having the structure of 32 bits.times.32 registers increases with 2048 pcs.
A logical circuit is also necessary for controlling the read port of a register file. More transistors demands more power consumption, and an expensive package for sealing such a microprocessor is also necessitated, which results in raising the price of the microprocessor products. Increase in power consumption is not preferable considering the recent market tendency toward less consumption. Increase in price and power consumption will deteriorate competitive power in the market.