1. Field of the Invention
The present invention relates to a data processor such as a microprocessor, and more particularly a system for a data processor processing an instruction set containing a primitive instruction and a high performance instruction, capable of parallel execution of a plurality of primitive instructions and execution of both primitive instructions and high performance instructions at high speed, A primitive instruction is an instruction which can be processed using an instruction execution unit one time, and a high performance instruction is an instruction which requires use of an instruction execution unit two or more times.
2. Description of the Related Art
Examples of conventional microprocessors capable of processing an instruction set containing both primitive instructions and high performance instructions, are such as 80486 described in "BYTE, November 1989, VOL.14/No.12, pp. 323-329" and i960CA described in "NIKKEI ELECTRONICS, 1990.1.8 (No. 440), pp. 177-186.
As shown in FIG. 7, 80486 microprocessor is constructed of an instruction fetch unit 10, instruction decoder 20, microprogram ROM 30, address calculation unit 120, and instruction execution unit 50. In operation, an instruction code 60 contained in an instruction to be decoded is sent from the instruction fetch unit 10 to the instruction decoder 20. A subsequent instruction code position indication signal 70 for indicating the position of the next instruction code to be decoded is sent from the instruction decoder 20 to the instruction fetch unit 10. A ROM address 100 which is part of the decoded result of the instruction is sent from the instruction decoder 20 to the microprogram ROM 30. An instruction execution unit control signal 130 is sent from the microprogram ROM 30 to the instruction execution unit 50. An address calculation unit control signal 130 which is another part of the decoded result of the instruction is sent from the instruction decoder 20 to the address calculation unit 120, and a memory access address 140 is sent from the address calculation unit 120 to the instruction execution unit 50.
As shown in FIG. 4, 80486 microprocessor has five pipeline stages including a prefetch stage PF, first decode stage D1, second decode stage D2, execution stage EX, and write-back stage WB.
At the prefetch stage PF, the instruction fetch unit 10 fetches an instruction from a main memory or cache (not shown), and stores it in a prefetch queue of the instruction fetch unit 10. The instruction fetch unit 10 also supplies the next instruction code 60 to be decoded to the instruction decoder 20 in accordance with the subsequent instruction code position indication signal 70 from the instruction decoder 20.
At the first decode stage D1, the instruction code 60 supplied from the instruction fetch unit 10 to the instruction decoder 20 is decoded to generate the ROM address 100 and address calculation unit control signal 130 as the instruction-decoded results.
At the second decode stage D2, the instruction execution unit control signal 110 is read from the microprogram ROM 30 at the ROM address 100 supplied from the instruction decoder 20, and the address calculation unit 120 calculates the memory access address 140 in accordance with the address calculation unit control signal 130.
At the execution stage EX, an arithmetic logic unit (ALU) (not shown) of the instruction execution unit 50 performs an arithmetic logic operation in accordance with the instruction execution unit control signal 110, or a memory access circuit of the instruction execution unit 50 accesses the memory at the memory access address 140. The instruction execution unit 50 performs other operations. This memory access is immediately executed using the memory access address 140 supplied from the address calculation unit 120.
At the write-back stage WB, the operation result or memory fetch data at the execution stage EX is stored in a register file (not shown).
As seen from the above description, in 80486 microprocessor, control signals are always obtained as outputs from the microprogram ROM 30, except a control signal for address calculation.
In i960CA, only high performance instructions are executed in response to a control signal output from a microprogram ROM. Namely, a high performance instruction is divided into primitive instructions and stored in the microprogram ROM. Therefore, a high performance instruction is executed in response not to an instruction code but to an output from the microprogram ROM. Specifically, as shown in FIG. 8, i960CA microprocessor is constructed of an instruction fetch unit 10, instruction decoder 20, microprogram ROM 30, instruction decoder input selector 150, and instruction execution unit 50. An instruction code 60 is sent from the instruction fetch unit 10 to the instruction decoder input selector 150. A ROM output 90 is sent from the microprogram ROM 30 to the instruction decoder input selector 150. An instruction decoder input 160 is sent from the instruction decoder input selector 150 to the instruction decoder 20. A subsequent instruction code position indication signal 70 is sent from the instruction decoder 20 to the instruction fetch unit 10. A ROM address 100 is sent from the instruction decoder 20 to the microprogram ROM 30. An instruction decoder input select signal 170 is sent from the instruction decoder 20 to the instruction decoder input selector 150. An instruction execution unit control signal 110 is sent from the instruction decoder 20 to the instruction execution unit 50.
As shown in FIG. 5, microprocessor i960CA has three pipeline stages including a prefetch stage PF, decode stage ID, and execution stage EX.
At the prefetch stage PF, while executing a primitive instruction, the instruction fetch unit 10 fetches the instruction code 60 and stores it in a prefetch queue, and the instruction code 60 is supplied via the instruction decoder input selector 150 to the instruction decoder 20 in accordance with the subsequent instruction code position indication signal 70 from the instruction decoder 20. On the other hand, while executing a high performance instruction, the ROM output 90 is read from the microprogram ROM 30 and supplied via the instruction decoder input selector 150 to the instruction decoder 20.
At the decode stage ID, the instruction decoder 20 decodes the instruction code 60 from the instruction fetch unit 10 or the ROM output 90 from the microprogram ROM 30, to generate an instruction execution unit control signal 110, and if the decoded instruction is a high performance instruction, it generates a ROM address 100 and changes an instruction decoder input select signal 170 to the ROM output select side.
At the execution stage EX, the instruction execution unit 50 performs an arithmetic logic operation, memory access, or other operations.
As described above, in 80484 microprocessor, control signals other than for address calculation are obtained as ROM outputs for all instructions. With such a microprocessor, all instructions require microprograms, resulting in an increase of ROM capacity, and a corresponding increase of ROM area and power consumption, posing a problem of a slow operation speed.
Furthermore, since the microprogram ROM is continuously enabled, the power consumption by ROM occupies a larger portion of the total power consumption of the microprocessor. The higher the integration degree and operation frequency of a microprocessor, the more the power is consumed by the microprocessor. It is therefore necessary to suppress an increase in power consumption, and accordingly to suppress the power consumption by ROM.
Still further, in the case of the parallel execution of a plurality of instructions, e.g., two instructions, the structure of microprocessors becomes as shown in FIG. 2. With the arrangement shown in FIG. 2, each of the two instructions requires not only its own instruction decoder but also its own ROM. Two ROMs are therefore necessary for the parallel execution of two instructions. Another problem is that executing two high performance instructions using two instruction execution units requires control using two independent ROMs.
Namely, in FIG. 2, first and second ROM addresses 101 and 102, first and second address calculation unit control signals 131 and 132, and first and second memory access signals 141 and 142 are duplicated.
As also described above, i960CA microprocessor executes a primitive instruction without using ROM, and executes a high performance instruction by using ROM. As shown in FIG. 9, an overhead of reading ROM appears at the execution start of a high performance instruction, causing one cycle vacancy of the pipeline. For example, considering a high performance instruction requiring two or three cycles to complete its execution, the performance degradation by this vacancy of the pipeline cannot be neglected.
Furthermore, in the case of the parallel execution of a plurality of instructions, if for example a high performance instruction and the subsequent instruction are supplied to decoders at the same time, a prefetch stage for the instruction subsequent to the high performance instruction is required to be newly executed. It is therefore necessary for the instruction decoder to change the subsequent instruction code position indication signal to be output to the instruction fetch unit, depending upon whether the decoded instruction is a high performance instruction or a primitive instruction. With such a system, a processor requiring a full instruction code for the discrimination between the high performance instruction and primitive instruction takes time for generating the subsequent instruction code position indication signal. As a result, the cycle time for supplying an instruction code to the instruction decoder becomes long, resulting in a performance degradation.