This application claims priority to S.N. 98402457.0, filed in Europe on Oct. 6, 1998 and S.N. 98402455.4, filed in Europe on Oct. 6, 1998.
The present invention relates to zero anticipation in the field of computing systems. In particular, the invention relates to a zero anticipation mechanism, to a processing engine and a computing system including such a mechanism and to a method of zero anticipation in such apparatus.
Where reference is made to a computing system, it should be understood that this term is intended to relate generally to systems and apparatus which perform computations, including computers and electronic apparatus including processing engines for performing computations, and to the processing engines themselves. Many different types of processing engines are known, including the central processing units of mainframe systems, microprocessors, micro-controllers, digital signal processors and so on.
The performance of a computing system is vitally affected by the speed and accuracy with which arithmetic operations are performed in the processing engine. This is because many of the instructions executed by a processing engine of such a computing system require arithmetic operations. Arithmetic circuitry is often the most complex circuitry in the instruction execution unit of a processing engine in terms of the number of gates and logic levels. In relative terms, therefore, arithmetic operations tend to be slow and prone to error. One important aspect of the result of arithmetic operations is the determination of condition codes.
A condition code will be set by the processing engine to reflect the outcome of an arithmetic operation. This code assists the processing engine in making operational decisions which depend upon arithmetic results. A typical processing engine, such as a microprocessor or a digital signal processor for example, has an arithmetic logic unit (ALU) which performs mathematical operations on two or more xe2x80x9cNxe2x80x9d bit operands, where xe2x80x9cNxe2x80x9d represents the total number of bits per operand. It will be convenient in the following to refer to the xe2x80x9cixe2x80x9dth bit, where xe2x80x9cixe2x80x9d is an index variable whose value is between 0 and Nxe2x88x921 inclusive.
One type of computation result on which a decision, such as for example branch decision, might be made is where the operation result is a zero condition. For example, a branch might be made if the result of a computation is zero, whereas program execution might otherwise continue at the next command. Alternatively, the converse may be true.
Typically, the decision as to whether to take a branch or not will rely on the resolution of the computation result (e.g., whether the result is zero or not). As, however, this may take some time and, also, the branch operation itself will take some time, this can have a not insignificant effect on overall system performance.
Condition codes are also important in some non-arithmetic operations, for example a conditional data operation where the destination of a result will depend upon the resolution of the condition. An example of this could, for example, be a data load instruction involving the generation of data complements. Once again, the time taken to resolve the condition and then to effect the operation dependent thereon can have a not insignificant effect on performance.
The condition code may, for example, be employed to indicate that the result of an operation is greater than zero (GT), less than zero (LT), or equal to zero (EQ). LT is the easiest outcome to detect, because it simply involves examining the sign bit of the result. In general, GT and EQ are more difficult outcomes to detect, because the sign bit of the result is set positive when the result is either zero or a positive quantity. Therefore, examining the sign bit of the result, when the result is equal to zero, does not indicate whether the result is zero or a positive number. However, for the result of a specific instruction on specific data, EQ and GT are mutually exclusive. Thus, determining one is sufficient for the determination of the other, once LT has been excluded.
In adder operation, the traditional method of determining when the result is zero is to NOR all of the output bits of an adder circuit to reduce as fast as possible the output bits to a single output (zero) using a binary tree. However, as many architectures require 32-bit, or wider, data path for fixed point units, they also require adders of 32 bits in width. The NORing all of the output bits may require two or more additional stages of logic, depending on the technology used for implementation. For example, to reduce 32 bits would take five stages with two input NOR gates (25=32) and three stages with four input NOR gates (43=64). As higher clock rates are demanded, the addition of logic stages to an adder circuit can result in the condition code becoming critical, thereby forcing completion of its computation into the next machine cycle.
Several solutions have been proposed for determining when a result is zero. For example, U.S. Pat. No. 4,924,422 issued May 8, 1990 to IBM Corporation describes a method and apparatus for determining when a result is zero. This patent determines when two operands are equivalent directly from the operand without the use of an adder. In one embodiment, conditions for the sum being equal to zero are determined from half sum to carry and transmit operators derived from the input operands. These operands are used in some known types of adders and, thus may be provided from a parallel adder to the condition prediction circuitry. In another embodiment, the equations for a carry-save-adder are modified to provide a circuit specifically designed for the determination of the condition when the sum of the operands is equal to zero. This sum is equal to zero circuit reduces the gate delay and gate count allowing the processor central processing unit to determine the condition prior to the actual sum of two operands. This allows the processing engine to react to the condition more quickly, thus increasing overall operating speed.
U.S. Pat. No. 4,815,019, issued Mar. 21, 1989 to Texas Instruments, Inc., describes a method and apparatus for determining when an ALU result is zero. This patent describes a fast ALU=0 circuit that is used with a carry-select look ahead ALU. Preliminary ALU=0 signals are derived for each section of the ALU prior to a carry in signal being received by that section. When the carry in signal is received, a final comparison is made with the least significant bit of the section and the final ALU=0 signal is generated. The ALU=0, computation is completed one gate delay after the ALU computation is completed. The circuit for computing whether the result of an ALU computation is zero determines whether certain bits are zero before the ALU computation is complete. When the final ALU computation is available, only a very small number of bits need be considered to determine whether the result is zero. This determination is made with the insertion of only 1 additional gate delay after the ALU computation is complete.
U.S. Pat. No. 5,508,950, issued Apr. 16, 1996 to Texas Instruments, Inc., describes a circuit and method for detecting when an ALU result is zero. This patent describes a circuit and method for detecting if a sum of a first multi-bit number A of N bits and a second multi-bit number B of N bits equals a third multi-bit number C of N bits prior to availability of the sum of A and B. A propagate signal, a generate signal and a kill signal are generated for each bit in the proposed sum. A zero signal is formed from these signals. The particular manner of forming the zero signal for each bit depends upon the state of the third multi-bit number C for the corresponding bit and the prior bit. The zero signal is an exclusive OR of the corresponding propagate signal Pn and a kill signal Knxe2x88x921 of a prior bit if the current bit and the prior bit of C are xe2x80x9c00xe2x80x9d. The zero signal is an exclusive NOR of the corresponding propagate signal Pn and a generate signal Gnxe2x88x921 of a prior bit if the current bit and the prior bit of C are xe2x80x9c01xe2x80x9d. The zero signal is an exclusive NOR of the corresponding propagate signal Pn and a kill signal Knxe2x88x921 of a prior bit if the current bit and the prior bit of C are xe2x80x9c10xe2x80x9d. The zero signal is an exclusive OR of the corresponding propagate signal Pn and a generate signal Gnxe2x88x921 of a prior bit if the current bit and the prior bit of C are xe2x80x9c11xe2x80x9d. The sum of A and B equals C if all the zero signals are active xe2x80x9c1xe2x80x9d. The propagate signal, generate signal and kill signal of the various bits can be used to form the sum. This technique provides the equality signal before the carry can ripple through the adder logic.
Accordingly, an aim of the present invention is to provide an improved mechanism and method for determining a zero condition, whereby operational speed of a processing engine of computing system may be increased.
Particular and preferred aspects of the invention are set out in the accompanying independent and dependent claims. Combinations of features from the dependent claims may be combined with features of the independent claims as appropriate and not merely as explicitly set out in the claims.
In accordance with a first aspect of the invention, there is provided a zero anticipation mechanism for an arithmetic unit of a processing engine. The zero anticipation mechanism comprises an array of cells interconnected to produce an ordered sequence of intermediate anticipation signals. The array of cells includes cells connectable to receive intermediate result signals from the arithmetic unit, cells for forwarding an intermediate anticipation signal supplied thereto and cells for generating a combination of a first intermediate anticipation signal and a second intermediate anticipation signal supplied thereto.
An embodiment of the invention can predict when the result of an arithmetic unit is zero before the result is available. The array of cells derives an algebraic or logical combination of intermediate result signals from an arithmetic unit to predict, or anticipate a zero result not later than the result is available.
In an embodiment of the invention, the array of cells can be arranged in an array with the cells selectively interconnected in first and second directions. The cells can have one or more of: a first input for connection to the output of an adjacent cell in the first direction; a first output for connection to an input of an adjacent cell in the first direction; a second input for connection to the output of an adjacent cell in the second direction; and a second output for connection to an input of an adjacent cell in the second direction.
Different cells in the array have different combinations of inputs and outputs according to their position in the array. A cell with a first input and a first output is operable to transmit a signal received at the first input to the first output without adding delay. A cell for forwarding an intermediate anticipation signal and having a second input and a second output is operable to buffer a signal received. at the second input for a predetermined time prior to forwarding the signal from the second output. A cell for forwarding an intermediate anticipation signal and having a second input and a second output is operable to buffer a signal received at the second input for a predetermined time prior to forwarding the signal from the second output. A cell for combining intermediate anticipation signals supplied thereto and having a first input, a second input and at least one of a first output and a second output is operable to combine an intermediate anticipation signal received at the first input with an intermediate anticipation signal received at the second input and to output the combined intermediate anticipation signal from at least one output. The combination can be a logical combination of the input signals. The logical combination can be defined by an operator u, such that:
(g_1, p_1, z_1)u(gxe2x80x94r, pxe2x80x94r, zxe2x80x94r)=(g_1+(p_1xc2x7gxe2x80x94r), p_1xc2x7pxe2x80x94r,(z_1xc2x7zxe2x80x94rxc2x7(xcx9cgxe2x80x94r))+p_1xc2x7zxe2x80x94rxc2x7gxe2x80x94r),
where g_1 and g_r are first and second generate terms, p_1 and pxe2x80x94r are first and second propagate terms, z_1 and z_r are first and second zero anticipation terms.
In an embodiment of the invention, a zero anticipation mechanism for an arithmetic unit providing an arithmetic result having N bits ordered from a lowest bit position to a highest bit position includes a plurality of sub-arrays. Each sub-array is associated with a respective group of adjacent bit positions of the arithmetic result and generates an intermediate anticipation result signal. The intermediate anticipation result signal of a sub-array is forwarded directly to all sub-arrays associated with higher order bit positions.
A global, or final output of a sub-array associated with a highest order bit position forms a zero anticipation signal.
An intermediate zero anticipation signal may also be provided from a sub-array associated with an intermediate bit position. This can provide an intermediate zero anticipation result in respect of bits up to an including the intermediate bit position.
An embodiment of the invention can be implemented for an arithmetic unit operable to perform an arithmetic operation on operands A and B, where each operand comprises an ordered sequence of N bits, A(i) and B(i), respectively, for i=0 to Nxe2x88x921, which includes a carry look-ahead adder for generating the result of the arithmetic operation. In such an implementation, the array of cells can be formed by cells of the carry look-ahead adder.
The arithmetic unit can include a carry-save adder responsive to operands A and B for producing an ordered sequence of intermediate result signals by carry-free combination of operand A with operand B. The zero anticipation mechanism can be connectable to the carry-save adder circuit for generating a zero anticipation signal based on an algebraic combination of carry-in signals from the carry-save adder with the ordered sequence of intermediate result signals for anticipating a zero magnitude result, with the zero anticipation output being generated not later than the result.
In accordance with another aspect of the invention, there is provided a processing engine including an arithmetic unit for providing an arithmetic result and a zero anticipation mechanism as set out above.
Examples of possible types of arithmetic unit are a multiply and accumulate unit, a floating point unit and, more generally any arithmetic and logic unit.
The processing engine could, for example, be a digital signal processor.
In accordance with further aspect of the invention, there is provided an integrated circuit comprising a processing engine as set out above.
In accordance with yet another aspect of the invention, there is provided a a method of anticipating a zero result of an arithmetic unit of a processing engine. The method comprises:
providing input data to the arithmetic unit,
generating intermediate result signals for the arithmetic unit; and
producing an ordered sequence of intermediate anticipation signals by receiving the intermediate result signals from the arithmetic unit to form intermediate anticipation signals, and selectively forwarding and combining intermediate anticipation signals in accordance with a predetermined algorithm for generating a zero anticipation signal.