1. Field of the Invention
The present invention relates to a data processing apparatus for performing a pipeline processing in a plurality of divided stages and, for example, it aims at a data processing apparatus which can be mounted inside a processor.
2. Related Background Art
With the development of multimedia and communication technique, the enhancement of processing properties of a processor has strongly been desired. Examples of a technique for enhancing the processing properties of the processor include a technique of raising an operation clock frequency, and a technique of performing an arithmetic processing in parallel.
However, when a plurality of operation units are disposed inside the processor and an arithmetic processing is executed in parallel, a circuit scale is enlarged, and the processing may not be performed in time by a wiring delay.
On the other hand, for a recent processor, in order to accelerate execution of an instruction, each instruction is divided into a plurality of stages and subjected to a pipeline processing in many cases. FIG. 1 is a block diagram showing a schematic constitution of a pipeline processor inside the processor, and FIG. 2 is a diagram showing a processing flow.
As shown in FIG. 1, each instruction is divided into five stages A to E and executed in order.
As shown in FIG. 1, each stage is provided with a flip-flop 11 for synchronizing input data, a logic circuit 12, and a multiplexer 13. An output of the multiplexer 13 is inputted to the flip-flop 11 of the next stage.
As shown in FIG. 2, when each instruction is subjected to the pipeline processing, the processor processing properties are enhanced. In order to further enhance the processing properties, however, a plurality of pipeline processing portions are sometimes disposed inside the processor.
FIG. 3 is a block diagram showing an example in which a plurality of pipeline processing portions are disposed inside the processor. An instruction read from an instruction cache (IC) 21 of FIG. 3 is dispatched to an empty pipeline processing portion among six pipeline processing portions (ALU) 24 via an instruction register (IR) 22, and then is executed by the empty pipeline processing portion. Data read out from a register file (RF) 23 in accordance with the instruction is calculated by the pipeline processing portion 24, and the execution result of the instruction is written back to a register file (RF) 25.
FIG. 4 is a block diagram showing a detailed constitution in the vicinity of an input of the pipeline processing portion 24. As shown in FIG. 4, a multiplexer 26 and flip-flop 27 are disposed between the register file 23 and pipeline processing portion 24. Since each pipeline processing portion 24 performs the processing in parallel, a control signal Control is supplied to each multiplexer 26 via a common control line, and each pipeline processing portion 24 performs an arithmetic processing based on the control signal Control.
However, when a plurality of pipeline processing portions are controlled with one control line, with a larger number of pipeline processing portions and a longer wiring length of the control line, fanout load of a control signal increases. In the recent processor, the operation clock frequency is very high. Therefore, there is possibility that the processing in each stage is not in time because of a control signal delay.
In order to reduce the fanout load of the control signal, it is preferable to reduce the wiring length of the control line. However, to enhance the processor processing properties, the number of pipeline processing portions has to be increased, and the wiring length of the control line is necessarily increased.
As another technique for reducing the fanout load of the control signal, the control signal may be buffered on a tree and supplied to each pipeline processing portion, or a plurality of control signals may be generated beforehand.
Furthermore, in recent years, to develop the processor and ASIC, a technique of arbitrarily combining various prepared function blocks to design LSI has become general. When the designing technique is employed, the combination of function blocks cannot be completely specified. Therefore, it is preferable to preset the fanout load of each signal with an allowance. However, it has heretofore been difficult to set the fanout load of the signal having a critical timing to a value such that erroneous operation is prevented.