FIELD OF THE INVENTION
The present invention relates to a process for performing numerical computations in an electronic circuit. The invention relates likewise to an arithmetic unit for implementing this process.
A main aim of the progress in technology used in integrated circuits is to increase processing speeds. A sector for which this progress is of particular relevance is numerical data processing, especially by microcontrollers or microprocessors and signal processors (DSP, for "Digital Signal Processor"). These types of circuit always include an Arithmetic and Logic Unit (ALU), which often constitutes the critical route for the data determining the maximum frequency at which the circuit can work.
Many current circuits include an ALU which performs the logic or arithmetic computations between two 32-bit operands, and generates a result on 32 bits. However, some computations can be performed on 16-bit operands (not needing maximum accuracy, management of logic operands on 16 bits, etc). The ALU then no longer works on 16 bits, and half of it therefore remains inactive; only half the computational power of the ALU is then used.
One of the most constraining jobs in terms of computation time is arithmetic addition/subtraction, and more particularly the carry propagation in such an operation.
The most conventional way of effecting a binary addition on n bits between two operands A (An-1 An-2 . . . A1 A0) and B (Bn-1 Bn-2 . . . B1 BO) consists in propagating the carry linearly, from one bit to the next bit, using elementary adders. The elementary adders perform, for each bit, the following two operations: EQU Si=Ai Bi Ci-1 EQU Ci=(Ai.Bi)+((Ai Bi).Ci-1)
with:
Ai, Bi: i-th bit of operand A and B; PA1 Si: i-th bit of the result of the addition (S=A+B); PA1 Ci: carry generated by the elementary addition in position i; PA1 " ": logical exclusive OR PA1 ".": logical AND: and PA1 "+": logical OR. PA1 carry selection, PA1 discreetization of the propagation by use of concatenation operators. PA1 "generate": gi=Ai.Bi; and PA1 "propagate": pi=Ai Bi PA1 either the outgoing carry from the other half-operator: and addition over 32 bits is then effected; PA1 or an external carry bit: two independent parallel additions over 16 bits are then effected. PA1 either to regard the selection cell as transparent and thus to effect operations on 32 bits, PA1 or to regard a selection cell as a delimiter which separates the adder into two independent sub-adders which effect additions over 16 bits, which sets the incoming carry in the sub-adder over the 16 most significant bits, and which outputs the outgoing carry from the sub-adder over the 16 least significant bits. PA1 the regularity is preserved of the layout of the cells which constitute the carry generator in the optimized Brent [sic] and Kung implementation (no appending of multiplexing cells, for example, which would break this regularity); PA1 the increase in surface area remains minimal (1/32 or 3.12%); PA1 the same optimization algorithm is used to split the 33-bit carry generator into blocks of unequal dimensions; PA1 the loss in propagation time for a computation over 33 bits (leading to a result over 32 bits) is negligible: of the order of 1.9% (it would be 1/32, namely 3.12%, when going from a 32-bit to 33-bit linear carry propagation adder, without counting the time for crossing the additional multiplexer).
By this method, it is possible to estimate the time required to effect one binary addition on N bits by: T(N)=N.times.T1, where T1 represents the time required by each elementary adder to compute the carry Ci as a function of Ci-1, Ai, Bi. It is seen that this time is determined entirely by the serial mode of propagation of the carry.
However, the "carry propagation" function can be effected more quickly if: serial propagation is replaced with parallel propagation. There are however various ways of effecting this parallelization: nowadays, the main methods used in circuits made by CMOS technology are:
The method by carry selection (Carry Select) consists in subdividing the operation over N bits by using, for example three adder sub-blocks each working on N/2 bits (here N is assumed to be even). The first adder sub-block performs the addition in linear carry propagation mode over the N/2 least significant bits of the operands while the other two adder sub-blocks perform, in parallel, addition over the N/2 most significant bits of the operands, the second sub-block assuming that the outgoing carry from the first sub-block equals 0 while the third sub-block assumes that this carry equals 1. Then, the carry actually leaving the first adder sub-block controls a multiplexer which selects that one of the two results arising from the second and third adder sub-blocks which corresponds to the carry result. If certain of the computations are effected in parallel, the computation time for an N-bit addition by the method of carry selection can be estimated by: EQU T'(N)=N/2*T'1+T'0
where T'1 is equal to the T1 of the elementary adder with linear propagation and T'0 is the time required for the selection between the two possible results for the intermediate carry. If T'0 is neglected, T' (N) (Carry Select) is of the order of half of T(N) (linear propagation). Note that the "Carry Select" principle can be generalized in order to optimize propagation time: it is possible to split an addition over N bits into several additions over numbers of distinct bits. For example, an addition over 32 bits can be effected with five sub-blocks of ascending size: 3, 5, 6, 8, 10.
Typically, with such a "Carry Select" adder, an addition over 32 bits is performed in 1.4 times less time than with the equivalent with linear propagation.
An even more powerful method of effecting an addition over N bits consists in using the basic principles described by BRENT and KUNG ("A Regular Layout for Parallel Adders" IEEE Transactions on Computers, Vol. C-31, No. 3, March 1982): binary addition over N bits can be transformed into a parallel computation using the concatenation operator "o" defined by: EQU (gl, pl) o (gr, pr)=(gl+pl.gr, pl.pr).
Consider, for each bit in position i, the terms:
The outgoing carry for each bit is then: EQU Ci=Gi for i=0, . . . , N-1
with ##EQU1##
The incoming carry Cin being determined, each sum bit is obtained: EQU SO=pO Cin; and EQU Si=pi Ci-1 for 0&lt;i&lt;N
The operator "o" being associative (but not commutative), it is possible to write: ##EQU2##
The computation of (Gi,Pi) is therefore performed on the basis of two terms (Gi,m; Pi,m) and (Gm-1,0: Pm-1, 0) which are functionally of the same type: they are a function of i-m+1 (respectively m) consecutive input bits and require i-m (respectively m+1) applications of the operator "o". Accordingly, they can be computed by the same type of circuit.
It is therefore possible to improve the conventional BRENT and KUNG decomposition in which each carry generation block is divided recursively into two sub-blocks of equal dimension m=N/2. The improvement is via a recursive division into sub-blocks of unequal dimension and via the use of amplifying circuits: this minimizes the carry propagation interval. The basis of this optimization principle was published in 1986 by B. W. Y. WEI, C. THOMPSON, Y- F. CHEN in the article "Time-Optimal Design of a CMOS Adder" (IEEE publication of 1986 No. CH23317/86/0000/0186$01.00 following the conference ACSC-85: Asilomar Circuit System and Computer, page 186). The implementation of this algorithm leads to addition times which do not depend linearly on the number of bits (for example, an addition over 32 bits is only 1.15 times slower than an addition over 16 bits).
Typically, with such an adder, an addition over 32 bits is performed in 2.3 times less time than with the equivalent with linear carry propagation.
When desiring the same linear carry propagation operator of an arithmetic and logic unit to afford selectively either an operation over 32 bits or two parallel operations over 16 bits, it can be made in the form of a 32-bit operator by mounting a selection multiplexer between the elementary adders of respective ranks 15 and 16.
Control of the selection multiplexer then makes it possible to select whether the incoming carry in the half-operator corresponding to the 16 most significant bits is the outgoing carry from the half-operator corresponding to the 16 least significant bits (the incoming carry being supplied to an input of the selection multiplexer) thereby performing a "normal" addition over 32 bits; or whether this incoming carry is an external carry bit supplied to another input of the selection multiplexer, thereby performing two independent parallel additions of 16 bits.
This simple solution can also be applied to a carry selection adder, with the following restriction: in order to easily insert a mode selection multiplexer (32 bits or 2.times.16 bits) the bit of rank 15 must be situated at the output of one of the sub-blocks of this type of operator. It will then be possible easily to adapt the control of this multiplexer so that the incoming carry of the half-operator corresponding to the 16 most significant bits is:
This solution however imposes a constraint in regard to the optimal choice of the size of the sub-blocks.
However, in an optimized BRENT and KUNG type ALU, such a way of selecting the working modes of the operator cannot be applied without losing the benefit of implementing this algorithm: this would amount to making two BRENT and KUNG type adders mounted in series which, when working in 32-bit mode would have a carry propagation time markedly greater than that of a single 32-bit BRENT and KUNG type adder (typically 1.5 times greater).
There is therefore a need for a process making it possible, by applying an optimal carry propagation algorithm, in particular the optimized BRENT and KUNG algorithm, to effect with the same operator either an operation over n+m bits or two parallel operations over n bits and over m bits respectively, with n=m=16 in a typical example.
The purpose of the present invention is to meet this need.