1. Field of the Invention
The present invention relates to a processor system having a main processor with a coprocessor connected thereto.
2. Description of Related Art
A technology that uses a coprocessor specialized in a particular process in order to improve performance of a microprocessor is known in the art. Exchanging data between a main processor and a coprocessor in a processor system is generally performed via a storage region accessible for the main processor and the coprocessor. For example a general purpose register mounted to the main processor is used as the storage region for exchanging data between the main processor and the coprocessor.
In the processor system using the coprocessor, generally an instruction executed in the coprocessor (hereinafter referred to as an extended instruction) is issued from the main processor to the coprocessor. The coprocessor retrieves data from the general purpose register mounted to the main processor, executes the extended instruction, and stores the result of the execution of the extended instruction to the general purpose register.
For the main processor to use the execution result of the extended instruction by the coprocessor, the main processor needs to access the storage region storing the execution result after completing to write the execution result by the coprocessor. Therefore, the processor system having the coprocessor requires a mechanism for adjusting a timing to write the execution result by the coprocessor to the storage region such as the general purpose register and a timing for the main processor to access the storage region storing the execution result by the coprocessor.
As one of the adjusting mechanisms, there is a configuration for having the same number of pipeline stages for the main processor and the coprocessor (see Japanese Unexamined Patent Application Publication No. 9-319578). Specifically, whether the fetched instruction is an instruction executed by the main processor (hereinafter referred to as a basic instruction) or an extended instruction executed by the coprocessor, the number of clock cycles necessary for processes from fetching the instruction to completing the instruction is kept constant. Such configuration facilitates a pipeline control because the pipeline control including an interlock control may be performed regardless whether the fetched instruction is the basic or the extended instruction.
An example of a processor system securing to synchronize the main processor and the coprocessor by having the same number of pipeline stages of the main processor and the coprocessor is shown in FIG. 7. A processor system 7 of FIG. 7 has a coprocessor 80 connected to a main processor 70. Components of the processor system 7 are described hereinafter in detail.
An instruction fetch unit 72 sequentially retrieves instructions from an instruction memory 71 using address information stored to a program counter 721. After retrieving an instruction, a PC update unit 722 updates a value of the program counter 721 to retrieve the next instruction.
An instruction decode unit 73 decodes the instruction retrieved by the instruction fetch unit 72. The instruction decode unit 73 issues an instruction to a computing unit included in the main processor 70 if the decoded instruction is a basic instruction. On the other hand if the decoded instruction is the extended instruction, the instruction decode unit 73 transfers the instruction to the coprocessor 80 via a coprocessor I/F 78. A request signal (CPRQ) requesting to execute the extended instruction, an instruction code (CPOP), and an immediate value (CPIMM) are transferred from the coprocessor I/F 78 to the coprocessor 80.
A pipeline control unit 731 included in the instruction decode unit 73 performs an interlock control for a pipeline process of the main processor 70.
The main processor 70 includes a multiplier-accumulator (MAC) 741, an Arithmetic Logical Unit (ALU) 741, and a barrel shifter (BSFT) 743. Depending on a type of a process specified by the basic instruction, a computing unit is selected from the MAC 741, the ALU 742, and the BSFT 743. Then an input value is fetched from a general purpose register 74 to execute the basic instruction. An execution result of the basic instruction is stored to the general purpose register 74 via a destination bus. A MUX 751 is a circuit for selecting an operand for the ALU 742 between the immediate value (IMM) and the general purpose register 74. A selector 77 is a circuit for selecting output data to the destination bus from outputs of the MAC 741, the ALU 742, and the BSFT 743.
A control unit 81 included in the coprocessor 80 receives the CPRQ, the CPOP, and the CPIMM that are input via the coprocessor I/F 78. An instruction decode unit 811 decodes the extended instruction and outputs a control signal necessary for computing units included in the coprocessor 80 to execute the extended instruction. The control signal includes a processing request to the computing units, an indication of an operand register, and an immediate value.
The coprocessor 80 includes multipliers (MUL) 821 and 822, and a user-defined computing unit 823. Depending on a type of a process defined by the extended instruction, either the MULs 821 and 822 or the user-defined computing unit 823 is selected to fetch the input value from the general purpose register 74 to execute the instruction. An execution result of the extended instruction is stored to the general purpose register 74 via the destination bus. The MUL 821 is a former half of 32 bits×16 bits multiplier that generates partial products or the like by Booth algorithm. The MUL 822 is a latter half of 32 bits×16 bits multiplier that adds the partial products. A 32 bits×16 bits multiplication instruction is executed by two stages of the MULs 821 and 822.
A frequency flag 79 indicates whether a clock frequency of the processor system 7 is high or low speed, or indicates whether to divide an EX stage of the main processor 70 and the coprocessor 80. For example in a case the EX stage needs to be divided with clock frequency exceeding 200 MHz, the flag is set to on if the clock frequency is higher or equal to 200 MHz, and the flag is set to off if the clock frequency is lower or equal to 200 MHz.
When the frequency flag 79 is set to on, specifically when the clock frequency is high speed, in the main processor 70, multiplexers 752 to 754 output input signals from flip-flops (FF) 761 to 763. On the other hand when the frequency flag 79 is set to off, specifically when the clock frequency is low speed, the multiplexers 752 to 754 output input signals from the MAC 741, ALU 742, or BSFT 743.
Similarly with when the frequency flag 70 is set to on, specifically when the clock frequency is high speed, in the coprocessor 80, multiplexers 841 to 843 output input signals from flip-flops (FF) 831 to 833. On the other hand when the frequency flag 79 off, specifically when the clock frequency is low speed, the multiplexers 841 to 843 output input signals from the MUL 821 or the user-defined computing unit 823.
A pipeline control unit 731 included in the main processor 70 monitors the frequency flag 79 to get the number of clock cycles required for the EX stage of the main processor 70. Then if the frequency flag 79 is set to on, the EX stage is divided into an EX1 and EX2 stages to operate them in 2 clock cycles.
Similarly a control unit 81 included in the coprocessor 80 monitors the frequency flag 79 to get the number of clock cycles required for the EX stage of the coprocessor 80. Then if the frequency flag 79 is set to on, the EX stage is divided into EX1 and EX2 stages to operate them in 2 clock cycles.
A concept of the pipeline process of the processor system 7 is described hereinafter in detail with reference to timing diagrams of FIGS. 8A and 8B. FIG. 8A is a view showing a case with the clock frequency being low, and each of IF, DEC, and EX stages are executed in 1 clock cycle. In the IF stage, a process of the instruction fetch unit 72 is executed, and in the DEC stage a process of the instruction decode unit 73 is executed. For the basic instruction in the EX stage, operations by the computing units (MAC 741, ALU 742, and BSFT 743) included in the main processor 70, and writing of the execution result to the general purpose register are performed. For the extended instruction in the EX stage, a decoding of the extended instruction by the instruction decode unit 811, processes by the computing units in the coprocessor (MULs 821 and 822 or the user-defined computing unit 823), and a writing of the execution result to the general purpose register 74 are performed.
On the other hand FIG. 8B is a view showing a case with the clock frequency being high, and the EX stage is divided into EX1 and EX2 stages to be performed in 2 clock cycles.
For the basic instruction, the processes by the computing units (MAC 741 and ALU 742, or BSFT 743) of the main processor 70 are performed in the EX1 stage, and the writing of the execution result to the general purpose register 74. For the extended instruction, operation by the MUL 821 or the user-defined computing unit 823 is performed in the EX1 stage, and the writing of the execution result to the general purpose register 74 is performed in the EX2 stage.
As described in the foregoing, the processor system 7 divides the EX stage of the coprocessor 80 if the clock frequency is high speed, and in response to this, the EX stage of the main processor 70 is divided into two stages. This enables the number of clock cycles required from decoding to completion of the execution of the basic instruction executed in the main processor 70 to be changed according to the number of clock cycles required from decoding to completion of the execution of the extended instruction executed in the coprocessor 80. The number of clock cycles required from decoding to the execution completion of the instruction is hereinafter referred to as the number of instruction execution cycles.
The processor system 7 constituted as above is able to have the same number of pipeline stages and the number of instruction execution cycles of the main processor 70 and the coprocessor 80, enabling to secure to synchronize the main processor and the coprocessor.
In the processor system securing to synchronize the main processor and the coprocessor by having the same number of instruction execution cycles of the main processor and the coprocessor, if the number of clock cycles required to execute the instruction in the coprocessor increases, the number of instruction execution cycles of the main processor must be increased in accordance with the number of instruction execution cycles of the coprocessor, regardless of the performance of the main processor.
The increase of the number of instruction execution cycles, specifically the increase of the number of pipeline stages, causes to deteriorate branch penalty. Therefore in the conventional processor system, the branch penalty is deteriorated in the main processor, not only in the coprocessor, if the number of instruction execution cycles of the coprocessor increases as in the high speed operation.