The integrated circuit industry is constantly trying to manufacture ICs applicable to high-speed pipeline data processing. The rate of instruction execution depends upon the speed of each of operational blocks in a computer. In order to increase the speed of operational blocks, there has been offered a technique known as pipeline data processing in the art. Pipeline data processing is a technique in which a data processor is broken into a plurality of processing stages for overlapping the execution of several instructions at the same time.
One such pipeline data processing is described by reference to FIGS. 10a and 10b. FIG. 10a shows the organization of a conventional IC for pipeline data processing. FIG. 10b is a data flow diagram in the conventional pipeline data processing. FIG. 11 depicts a logic element, used for the conventional pipeline data processing, by way of example.
FIG. 10a shows the following: a clock generator; latches (i.e. latch-1, latch-2, and latch-3); and logic elements 1 and 2. An external clock CLK is applied to the clock generator, and then the clock generator generates a clock signal ph1. The clock signal ph1 is delivered to the latch-1, to the latch-2, and to the latch-3 so that these three latches are timed for synchronization. In synchronism with the clock signal ph1, the latch-1 takes data (i.e. data1) and outputs data (i.e. data2). In synchronism with the clock signal ph1, the latch-2 takes data (i.e. data3) and outputs data (i.e. data4). In synchronism with the clock signal ph1, the latch-3 takes data (i.e. data5) and outputs data (i.e. data6). The logic element 1 receives data2, processes it, and outputs data3 as a result of such processing. The logic element 2 receives data4, processes it, and outputs data5 as a result of such processing.
FIG. 11 shows the organization of the logic element 1. The logic element 1 has full adders FA1, FA2, FA3, FA4, and FA5. A1 to A5, B1 to B5, and C1 are equivalent to data2. S1 to S5 are equivalent to data D3. The full adder FA1 receives C1, A1, and B1 and outputs S1 (i.e. a sum output) while delivering a carry output 221 to the next full adder FA2. The full adder FA2 receives A2 and B2, in addition to the carry output 221 and outputs S2 (i.e. a sum output) while delivering a carry output 222 to the next full adder FA3. The full adder FA3 receives A3 and B3, in addition to the carry output 222 and outputs S3 (i.e. a sum output) while delivering a carry output 223 to the next full adder FA4. The full adder FA4 receives A4 and B4, in addition to the carry output 223 and outputs S4 (i.e. a sum output) while delivering a carry output 224 to the next full adder FA5. The full adder FA5 receives AS and B5, in addition to the carry output 224 and outputs S5 (i.e. a sum output).
In the case of the logic element 1 of FIG. 11, its maximum delay time is the time between the application of A1 and the appearance of S5 (i.e. the time taken for traversing a carry propagation path passing through all the full adders FA1 to FA5), while on the other hand its minimum delay time is the time between the variation of A1 and the appearance of S1 (i.e. the time taken for traversing a path passing through only one of the full adders FA1 to FA5).
The operation of the above-described prior art is explained by making reference to FIGS. 10a and 10b.
The flow of data is described by focusing on n-th data of data1. At (n+1)-th cycle, the n-th data is fed to the logic element 1 as data2. At the end of (n+1)-th cycle, the n-th data becomes definite. At (n+2)-th cycle, the n-th data is fed to the logic element 2, and at the end of (n+2)-th cycle, the logic element 2 outputs data which is defined as data5. Such defined data is output by the latch-3 at (n+3)-th cycle.
Without the execution of pipeline processing, the sum of the delay time of the logic element 1 and the delay time of the logic element 2 must fall within the range of one clock cycle. This inevitably increases the length of a clock cycle, resulting in the slow down of operation. Conversely, with the execution of pipeline processing, the following data becomes processable for every clock cycle. What is required is that each of the delay times of the logic elements 1 and 2 just falls within the range of one clock cycle. Because of this, high-speed data processing is achievable.
In FIG. 10b, an interval taken for defining the data output of the logic element 1 is indicated by I. An interval, within which the data output of the logic element 1 is not subjected to any definition, is indicated by II. The interval I has a sub-interval indicated by III which still remains definite even after (n+1)-th data is applied to an input of the logic element 1. In other words, the interval III is the minimum delay time of the logic element 1.
If pipeline processing is performed by the above-described configuration with the logic elements 1 and 2 having different maximum delay times, processing time (clock cycle) is restricted to a greater one of the maximum delay times of the logic elements 1 and 2. This impedes high-speed data processing.
Theoretically, the above problem could be dealt with by equalizing the maximum delay times of logic elements. This requires adequate recombination of the logic elements 1 and 2 so as to optimize them. Such optimization is a difficult problem. Further, if logic elements, which are not optimized, are used, it is necessary to take a number of elapsed data in order to obtain logic results during the logical processing. This requires large-scaled latches.