Some prior art designs for data processing circuits in LSI and VLSI have formed a continuation of the relatively random constructions used in smaller scale integration and hybrid circuits. Such approaches tend to produce circuits which consist of blocks of random logic all of which have to be individually designed. On going to VLSI these methods become impractical for a number of reasons. One such reason is that the characteristics of possibly hundreds of thousands of transistors may have to be tailored individually. Secondly, totally random layouts may take many man years of effort to construct and subsequently check. Additionally, the circuits are difficult and sometimes impossible to test and may require extra testing circuits to be incorporated into the design. In addition, for example, the resulting use of long conducting interconnections between circuit elements which cannot be positioned close together on a chip increases stray capacitance. This acts to reduce the potential switching speed of a circuit. Additionally, the need for large drivers is concomitantly increased.
For LSI and VLSI an additional problem with timing delays is introduced due to transmission line effects coming into play. These arise when the total resistance of interconnections is relatively high thus making the transmission line component of any time delays dominant. In prior art LSI such transmission line effects are observed in polysilicon wires. The reasons for this can be seen by considering that, as the cross sections of integrated circuit components are scaled down for VLSI, the resistance per unit length of conductor paths increases quadratically with scaling factor. As a result, transmission line effects may be observed even in high conductivity connections such as aluminium wires on VLSI chips. A major limitation of such effects is that they cause data propagation delays proportional to (length of wire).sup.2. The maximum speed of operation of a circuit is therefore limited.
Overall, synchronization of signals across a circuit therefore demands in-depth analysis and leads to complex designs to compensate for the delays involved.
At a higher level, rationalization of data processor architecture has been attempted by use of modular designs employing regular arrays of sub-processors, each of which performs a specified function at data word level. The individual sub-processors then work in sympathy with each other to produce an overall data processing function. However, the design advantages of the systematic modular approach at sub-processor level have not been taken into account within the subprocessors themselves.
Prior art techniques have included "ripple-through" multipliers i.e. multipliers in which a multiplication step involves the completion of the multiplication of a digit of the multiplier by the multiplicand and in which a following step does not begin until the digit of the multiplier involved in a previous step has rippled through and all the processing circuits have interacted with each other to complete that step. These have employed arrays of multiplying cells for multiplying the bits of the multiplier and multiplicand. Ripple-through designs, however, have time penalties due to the necessity to ripple through each digit of the multiplier and its associated carry-over digits at each stage of the multiplication. Attempts to alleviate these time penalties have led to introduction of "pipelining" into the cell arrays ie the splitting up of the arrays into sections by introducing latches between some of the cells in order to split each stage of the multiplication into sub-stages such that it is now necessary only for each sub-stage to be completed before the next is allowed to begin. Multiplier digits may, therefore, be fed in and product digits output at a rate faster than is allowed by the ripple-through design.
Prior art pipelined arrays have included Guild arrays, shift-and-add arrays and carry-save arrays. However, in all these prior art arrays, for each bit of the multiplier, it is necessary at some stage in the multiplication to broadcast the bit to all bits of the multiplicand in a single time interval. For a multiplication involving a multiplicand with more than a small number of bits, such broadcasting leads to delays which severely limit the maximum clock frequency attainable as the total capacitance loading on the appropriate line (gate capacitance plus interconnect capacitance) may be sizeable. Large current (and area) drivers can be used to improve propagation speeds on such a line, but they themselves impose a delay--the driver amplifer delay--on the relevant signal. Moreover the physical size of the driver required often forces a compromise on circuit speed.
An object of the invention is to provide a systematic approach at bit level to integrated circuit design which allows fast circuit operation.