1. Field of the Invention
The present invention relates to a parallel processing type processor system for performing parallel execution at instruction level, and more particularly, to trap and stall control functions in the parallel processing type processor system.
2. Description of the Background Art
In recent years, as a possible answer to a growing expectation for a high speed personal computer, a CPU called superscaler processor or VLIW (Very Long Instruction Word) which can perform parallel processing at machine language instruction level has been developed and already realized on a VLSI chip. In such a parallel processing CPU, the instructions of RISC are used as the basic instruction sets, and the processing performance is improved by fetching and executing a plurality of instructions concurrently. In particular, the superscaler processor has an architecture in which the conventional RISC for a sequential processing at instruction level can be realized and the compatibility at user program level can be maintained, so that it is attracting a large expectation from computer users.
Such a processor system capable of performing parallel processing at instruction level which adopts a conventional trap control method has a schematic configuration as shown in FIG. 1.
This configuration of FIG. 1 realizes a processor system capable of performing parallel processing at instruction level which has five step pipe line stages including F stage (Fetch), D stage (Decode), E stage(Execution), M stage (Memory access), and W stage (register Write back), in which each instruction is given in a length of one word (32 bits).
As shown in FIG. 1, the processor system comprises: an instruction memory 1 for storing instructions; an instruction issue unit 2 for fetching four instructions of four word boundary concurrently from the instruction memory 1 at the F stage, accounting for the data dependency relationship and control dependency relationship among the four fetched instructions at the D stage, and supplying executable instructions through instruction supply lines 20, 21, 22, and 23 at the E stage; an arithmetic logic units (ALU0 and ALU1) 3 and 4 for carrying out the arithmetic logic calculation and memory address calculation at the stage E according to the instructions supplied from the instruction supply lines 20 and 21, respectively; a floating point adder (FADD) 5 for carrying out the floating point addition and subtraction at the E stage according to the instruction supplied from the instruction supply line 22; a floating point multiplier (FMUL) 6 for carrying out the floating point multiplication and division at the E stage according to the instruction supplied from the instruction supply line 23; memory access units (MA0 and MA1) 7 and 8 for carrying out memory access operations with respect to a two port data memory 25 at the M stage according to the outputs of the ALU0 3 and ALU1 4, respectively; floating point exception check units (EC1 and EC2) 9 and 10 for carrying out exception cheek in the floating point calculations at the M stage according to the outputs of the FADD 5 and FMUL 6, respectively; and a multi-port register file 11 having twelve ports including four write ports for receiving the outputs of the MA0 7, MA1 8, EC1 9, and EC2 10 at the W stage, and eight read ports for supplying operand data to the ALU0 3, ALU1 4, FADD 5, and FMUL 6 through operand data supply lines 12 to 19 at the E stage.
In this configuration FIG. 1, the integer calculation exception trap such as a page fault or a overflow is generated by the MA0 7 and MA1 8, while the floating point calculation exception trap is generated by the EC1 9 and EC2 10.
In order to deal with such an exception trap, the processor system is further equipped with a trap cause register 30 for storing a cause of the trap generation; a trap address register 32 for storing an address of the instruction which caused the trap generation; and a trap control unit 33 for receiving the trap causes from the MA0 7, MA1 8, EC1 9, and EC2 10 transmitted through trap request signal lines 43 to 46, in response to which a trap signal is asserted through trap signal lines 34 to 38, while generating appropriate inputs for the trap cause register 30 and the trap address register 32 through signal lines 40 and 42, respectively.
The trap signal in the trap signal line 34 is transmitted to the instruction issue unit 2, ALU0 3, and ALU1 4 while the trap signals in the trap signal lines 35 to 38 are transmitted to the MA0 7, MA1 8, EC1 9, and EC2 10, respectively. In response to the trap signal from the trap control unit 33, an execution invalidation flag is activated in each element, so as to abort the processings of the instructions at the later pipe line stages while the instruction issue unit 2 starts the instruction fetch for the prescribed trap treatment routine in which the trap cause and the trap address stored in the trap cause register 30 and the trap address register 32 are utilized.
In further detail, the trap control unit 33 has a configuration shown in FIG. 2. Namely, the trap control unit 33 further comprises: an M stage program counter 1 (MPC) 51 for storing a common portion of word addresses of the instructions currently executed at the M stage in which two least significant bits of the addresses of the instructions are omitted; M stage sub-program counters (submpc1, submpc2, submpc3, submpc4) 53, 54, 55, and 56 for storing individual portions of word addresses of the instructions currently executed at the M stage, indicating two least significant bits of the addresses of the instructions currently executed by the MA0 7, MA1 8, EC1 9, and EC2 10, respectively; and a trap data generation unit 57 which outputs the smallest entry among the M stage sub-program counters 53 to 56 as an output 47 to be combined with the entry of the MPC 51 to generate the trap address 42 to be supplied to the trap address register 32, and the trap cause transmitted through one of the trap request signal lines 43 to 46 corresponding to the M stage sub-program counter 53 to 56 having the smallest entry as the trap cause 40 to be supplied to the trap cause register 30, while generating the trap signals for the trap signal lines 34 to 38.
Here, the trap signal in the trap signal line 34 is asserted whenever a trap cause is received from any one of the trap request signal lines 43 to 46, whereas each of the trap signals 35 to 38 is asserted when the trap request is received from one of the trap request signal lines 43 to 46 and the corresponding one of the M stage sub-program counter 53 to 56 has the entry which is equal to or larger than the entry in one of the M stage sub-program counter 53 to 56 from which the trap request is received.
FIG. 3A shows an exemplary program to be executed by the processor system of FIG. 1, and FIG. 3B shows a progress of the pipe line processing in the processor system of FIG. 1 using the conventional trap control method described above, in which the page fault occurred at the "load" instruction when the program of FIG. 3A is executed, where the shaded region indicates the instructions aborted. As shown in FIG. 3B, in the conventional trap control, when the trap is generated by the execution of the n+2-th "load" instruction during the course of the execution of the program, only those instructions whose instruction numbers are equal to or larger than n+2 are aborted.
However, in such a conventional trap control method, when the trap request is indicated through the trap request line 44 as the page fault is detected by the MA1 8 at the cycle C+3, the trap signals in the trap signal lines 35 to 38 cannot be determined until the entries in the M stage sub-program counters 53 to 56 are compared with each other to determine which is larger than which, so that there has been a problem that the cycle time must have a considerable length to accomodate such comparison operations, and this in turn caused the lowering of the clock frequency.
It is further to be noted that the RISC requires a configuration having a simple data path and a simple control circuit, and the data path of the superscaler processor having several of the data paths of RISC is not so complicated, but the control circuit of the superscaler processor can be quite complicated because of the instruction supply control and other control operations required. In particular, a hardware for treating a case of so called exception in which the continuation of the processing becomes impossible without the support from the software such as an OS can be very complicated, and a designing of such a hardware can be very time-consuming, so that such a hardware has often been a critical path in realizing the superscaler processor.