1. Field of the Invention
The present invention relates to an arithmetic unit, and particularly to an arithmetic unit adapted to a processor for audio communication or digital signal processing.
2. Description of the Related Art
Processors for voice communication and digital signal processing generally use pipelining for executing a plurality of instructions simultaneously in overlapping (as described, for example, in "COMPUTER ARCHITECHTURE A QUANTITATIVE APPROACH SECOND EDTION", Chapter 3, written by John L. Hennessy and David A. Patterson).
In pipelining, arithmetic ability can be improved by subdividing the pipelining into pipeline stages (processing units) so as to increase the number of instructions to be executed simultaneously. Further, because the number of logic stages per pipeline stage can be reduced, the operating rate can be improved.
Pipelining in a conventional arithmetic unit having eight pipeline stages will be described below with reference to FIGS. 1 and 2.
The arithmetic unit has a program counter 1 for indicating an access address of an instruction memory 2, the instruction memory 2 for storing instruction data, an instruction-memory-data storage means 3 for storing instruction data which is given from the instruction memory 2, and a first instruction decoder 4 for decoding instruction data which is given from the instruction-memory-data storage means 3 and outputting a data memory address and a result of temporary decoding of an instruction (hereinafter referred to as "an instruction-temporary-decoded-result").
The arithmetic unit further has a data-memory-address storage means 5 for holding the data memory address given from the first instruction decoder 4, a data memory 6 which is accessed on the basis of the data memory address given from the data-memory-address storage means 5 and in which an output data is stored, and a data-memory-data storage means 7 for holding the output data given from the data memory 6.
The arithmetic unit further has a storage means 8 for holding the instruction-temporary-decoded-result given from the first instruction decoder 4, a second instruction decoder 9 for decoding the instruction-temporary-decoded-result given from the storage means 8, a storage means 10 for holding the decoded result of the second instruction decoder 9, an arithmetic portion 11 for performing an arithmetic operation for the output data given from the data-memory-data storage means 7 on the basis of contents indicated by the decoded result given from the storage means 10, and a storage means 12 for holding the arithmetic result given from the arithmetic portion 11.
The operations in respective pipeline stages, shown in FIG. 1, of the arithmetic unit configured as described above will be described below with reference to FIG. 2 which is a pipeline configuration view.
Pre-IF Stage and Post-IF Stage
At time T1, an address is outputted from the program counter 1 to the instruction memory 2 and then the value of the program counter 1 is incremented by one (pre-IF stage for instruction i). At time T2, an instruction data stored in the address which is given from the program counter 1 to the instruction memory 2 at time T1 is outputted from the instruction memory 2 to the instruction-memory-data storage means 3 and held in the instruction-memory-data storage means 3 (post-IF stage for instruction i). Here, pipeline latches may be provided in the inside of the instruction memory 2 or a wave pipeline configuration is used in the instruction memory 2 in order to separate an IF stage into a pre-IF stage and a post-IF stage. Incidentally, in the wave pipeline configuration, a plurality of waves for data propagation are disposed among storage devices by performing clocking at a rate higher than the propagation delay of a combination circuit (JP-A-7-93149).
Incidentally, at time T2, an address is outputted from the program counter 1 to the instruction memory 2 and then the value of the program counter 1 is incremented by one (pre-IF stage for instruction i+1).
Pre-DEC1 Stage and Post-DEC1 Stage
At time T3, the instruction data held in the instruction-memory-data storage means 3 at time T2 is decoded by the first instruction decoder 4 to thereby generate a data memory address and an instruction-temporary-decoded-result to be used in the next DEC2 stage (pre-DEC1 stage for instruction i). At time T4, the data memory address and the instruction-temporary-decoded-result generated at time T3 are held in the data-memory-address storage means 5 and the storage means 8 respectively (post-DEC1 stage for instruction i+1). Here, pipeline latches may be provided in the inside of the first instruction decoder 4 or a wave pipeline configuration may be used in the first instruction decoder 4 in order to separate a DEC1 stage into a pre-DEC1 stage and a post-DEC1 stage.
Incidentally, at time T3, the instruction data stored in the address which is given from the program counter 1 to the instruction memory 2 at time T2 is outputted from the instruction memory 2 to the instruction-memory-data storage means 3 and held in the instruction-memory-data storage means 3 (post-IF stage for instruction i+1) and, at the same time, an address is outputted from the program counter 1 to the instruction memory 2 and then the value of the program counter 1 is incremented by one (pre-IF stage for instruction i+2).
Further, at time T4, the instruction data held in the instruction-memory-data storage means 3 at time T3 is decoded by the first instruction decoder 4 to thereby generate a data memory address and an instruction-temporary-decoded-result to be used in the next DEC2 stage (pre-DEC1 stage for instruction i+1). The instruction data stored in the address which is given from the program counter 1 to the instruction memory 2 at time T3 is outputted from the instruction memory 2 to the instruction-memory-data storage means 3 and held in the instruction-memory-data storage means 3 (post-IF stage for instruction i+2) and, at the same time, an address is outputted from the program counter 1 to the instruction memory 2 and then the value of the program counter 1 is incremented by one (pre-IF stage for instruction i+3).
Pre-DEC2 Stage and Post-DEC2 Stage
At time T5, the data memory 6 is accessed on the basis of the data memory address which is outputted from the data-memory-address storage means 5 and, at the same time, the instruction-temporary-decoded-result held in the storage means 8 at time T4, is decoded by the second instruction decoder 9 into a signal format which is necessary for the next EX stage (pre-DEC2 stage for instruction i). At time T6, the output data from the data memory 6 is held in the data-memory-data storage means 7 and, at the same time, the data decoded by the second instruction decoder 9 is held in the storage means 10 (post-DEC2 stage for instruction i). Here, pipeline latches may be provided in the inside of the data memory 6 and the second instruction decoder 9 or a wave pipeline configuration may be used in the data memory 6 and the second instruction decoder 9 in order to separate a DEC2 stage into a pre-DEC2 stage and a post-DEC2 stage.
Incidentally, at time T5, the data memory address and the instruction-temporary-decoded-result generated by the first instruction decoder 4 at time T4 are held in the data-memory-address storage means 5 and the storage means 8 respectively (post-DEC1 stage for instruction i+1). The instruction data held in the instruction-memory-data storage means 3 at time T4 is decoded by the first instruction decoder 4 to thereby generate a data memory address and an instruction-temporary-decoded-result to be used in the next DEC2 stage (pre-DEC1 stage for instruction i+2). The instruction data stored in the address which is given from the program counter 1 to the instruction memory 2 at time T4 is outputted from the instruction memory 2 to the instruction-memory-data storage means 3 and held in the instruction-memory-data storage means 3 (post-IF stage for instruction i+3) and, at the same time, an address is outputted from the program counter 1 to the instruction memory 2 and then the value of the program counter 1 is incremented by one (pre-IF stage for instruction i+4).
Further, at time T6, the data memory 6 is accessed on the basis of the data memory address which is outputted from the data-memory-address storage means 5 and, at the same time, the instruction-temporary-decoded-result held in the storage means 8 at time T5 is decoded by the second instruction decoder 9 into a signal format which is necessary for the next EX stage (pre-DEC2 stage for instruction i+1). The data memory address and the instruction-temporary-decoded-result generated by the first instruction decoder 4 at time T5 are held in the data-memory-address storage means 5 and the storage means 8 respectively (post-DEC1 stage for instruction i+2). The instruction data held in the instruction-memory-data storage means 3 at time T5 is decoded by the first instruction decoder 4 to thereby generate a data memory address and an instruction-temporary-decoded-result to be used for the next DEC2 stage (pre-DEC1 stage for instruction i+3). The instruction data stored in the address which is given from the program counter 1 to the instruction memory 2 at time T5 is outputted from the instruction memory 2 to the instruction-memory-data storage means 3 and held in the instruction-memory-data storage means 3 (post-IF stage for instruction i+4) and, at the same time, an address is outputted from the program counter 1 to the instruction memory 2 and then the value of the program counter 1 is incremented by one (pre-IF stage for instruction i+5).
Pre-EX Stage and Post-EX Stage
At time T7, the arithmetic portion 11 performs an arithmetic operation for the output data from the data-memory-data storage means 7 on the basis of contents designated by the output signal of the storage means 10 (pre-EX stage for instruction i). At time T8, the arithmetic result in the arithmetic portion 11 is held in the storage means 12 (post-EX stage for instruction i). Here, pipeline latches may be provided in the inside of the arithmetic portion 11 or a wave pipeline configuration may be used in the arithmetic portion 11 in order to separate an EX stage into a pre-EX stage and a post-EX stage.
Incidentally, at time T7, the output data from the data memory 6 is held in the data-memory-data storage means 7 and, at the same time, the data decoded by the second instruction decoder 9 at time T6 is held in the storage means 10 (post-DEC2 stage for instruction i+1). The data memory 6 is accessed on the basis of the data memory address outputted from the data-memory-address storage means 5 and, at the same time, the instruction-temporary-decoded-result held in the storage means 8 at time T6 is decoded by the second instruction decoder 9 into a signal format which is necessary for the next EX stage (pre-DEC2 stage for instruction i+2). The data memory address and the instruction-temporary-decoded-result generated by the first instruction decoder 4 at time T6 are held in the data-memory-address storage means 5 and the storage means 8 respectively (post-DEC1 stage for instruction i+3). The instruction data held in the instruction-memory-data storage means 3 at time T6 is decoded by the first instruction decoder 4 to thereby generate a data memory address and an instruction-temporary-decoded-result to be used in the next DEC2 stage (pre-DEC1 stage for instruction i+4). The instruction data stored in the address which is given from the program counter 1 to the instruction memory 2 at time T6 is outputted from the instruction memory 2 to the instruction-memory-data storage means 3 and held in the instruction-memory-data storage means 3 (post-IF stage for instruction i+5) and, at the same time, an address is outputted from the program counter 1 to the instruction memory 2 and then the value of the program counter 1 is incremented by one (pre-IF stage for instruction i+6).
Further, at time T8, the arithmetic portion 11 performs an arithmetic operation for the output data from the data-memory-data storage means 7 on the basis of contents designated by the output signal of the storage means 10 (pre-EX stage for instruction i+1). The output data from the data memory 6 is held in the data-memory-data storage means 7 and, at the same time, the data decoded by the second instruction decoder 9 is held in the storage means 10 (post-DEC2 stage for instruction i+2). The data memory 6 is accessed on the basis of the data memory address outputted from the data-memory-address storage means 5 and, at the same time, the instruction-temporary-decoded-result held in the storage means 8 at time T7 is decoded by the second instruction decoder 9 into a signal format which is necessary for the next EX stage (pre-DEC2 stage for instruction i+3). The data memory address and the instruction-temporary-decoded-result generated by the first instruction decoder 4 at time T7 are held in the data-memory-address storage means 5 and the storage means 8 respectively (post-DEC1 stage for instruction i+4). The instruction data held in the instruction-memory-data storage means 3 at time T7 is decoded by the first instruction decoder 4 to thereby generate a data memory address and an instruction-temporary-decoded-result to be used in the next DEC2 stage (pre-DEC1 stage for instruction i+5). The instruction data stored in the address which is given from the program counter 1 to the instruction memory 2 at time T7 is outputted from the instruction memory 2 to the instruction-memory-data storage means 3 and held in the instruction-memory-data storage means 3 (post-IF stage for instruction i+6) and, at the same time, an address is outputted from the program counter 1 to the instruction memory 2 and then the value of the program counter 1 is incremented by one (pre-IF stage for instruction i+7).
As described above, at time T8 in FIG. 2, a post-EX stage for instruction i, a pre-EX stage for instruction i+1, a post-DEC2 stage for instruction i+2, a pre-DEC2 stage for instruction i+3, a post-DEC1 stage for instruction i+4, a pre-DEC1 stage for instruction i+5, a post-IF stage for instruction i+6 and a pre-IF stage for instruction i+7 are executed simultaneously. As a result, there can be provided eight-fold performance compared with the case where the respective stages for the instructions are executed one by one.
The processor having the aforementioned conventional pipeline configuration, however, has the following problems.
(1) When a branch instruction is executed, the cycle (stalling) in which invalid "nop" instructions are issued occurs unless a branch destination is found.
When a branch instruction is executed, the address indicated by the program counter 1 is not determined unless the branch destination is found. The point of time when branch destination is found is after the post-EX stage for the branch instruction is completed. As a result, stalling occurs unless the address indicated by the program counter 1 is determined, so that pipelines cannot be used effectively.
When, for example, the execution of a branch instruction is started at time T2 in FIG. 3, the value of the program counter 1 is not determined unless the post-EX stage for the branch instruction is completed. Accordingly, the branch destination instruction j cannot be issued until the time T9 when the post-EX stage is completed, that is, in a period of time from the time T3 to the time T9, so that pipelines cannot be used effectively.
(2) When instructions using previous arithmetic results are inputted continuously, stalling occurs.
Assume now that instruction i in FIG. 2 is an instruction which is such that an arithmetic result of the instruction is determined after the post-EX stage for the instruction is completed. In the eight pipeline stages, the post-EX stage for the instruction i is not determined before the pre-EX stage for the next instruction i+1 is determined. Accordingly, the arithmetic result of the instruction i cannot be used in the instruction i+1. In order to use the arithmetic result of the instruction i in the instruction i+1, occurrence of stalling is required before the execution of the instruction i+1, so that pipelines cannot be used effectively.