An important application for leading zero detection is in the normalization of the mantissa that results from the true subtraction of two floating-point numbers, particularly when the normalized mantissas of the operands have associated exponents that differ by no more than one bit. Clearly, the closer the magnitudes of the operands, the greater number of leading zeros that obtain in the resulting difference of the normalized operand mantissas. The floating-point result is normalized by left shifting the mantissa. For example, if the un-normalized result of the subtraction yields 0.0001XXX . . . X, the normalized result would be 0.1XXX . . . X000 or 1.XXX . . . X0000, depending on the floating-point convention that is used. In the former case. the mantissa is left-shifted by 3-bit positions, causing three zeros to be appended and also causing the associated exponent to be decremented by 3; in the latter case, with the binary point to the right of the leading one (or hidden "1"), the normalization operation required a shift of 4-bit positions and decrementing the associated exponent by 4. Regardless of the convention used, a means for detecting the leading one (or number of leading zeros) is required for control of the post subtraction normalization unit.
FIG. 1 is a block diagram of a typical state of the art left-shift normalization unit. The normalization unit consists of a leading zero encoder (LZE) unit 20 that controls left-shift normalizing unit 60. Input data is supplied from subtractor 10 which produces at its output an un-normalized mantissa corresponding to the absolute difference of the operands. (The signum information is carried as a separate sign bit). This n-bit result is applied to both LZE 20 and shifter 60.
Left shift unit 60 typically comprises barrel shifter means 30 and 40 for rotating the input bit string and a zero mask means 50 for appending zeros. FIG. 2 is a truth table for a 16-bit input left-shift normalizing unit 60 in which "Z" represents zero introduce by zero-mask unit 50. The 4-bit shift code and 1-bit zeros signal is generated by LZE unit 20. The binary coded shift code is applied to barrel shifter means 30 and 40, while the zeros (active if the difference is zero) is applied to zero-mask unit 50 forcing zeros (Z) in all bit positions.
The reason for using two barrel shifter units (30 and 40) in normalizer 60 is because of the circuit complexity that would obtain if the total shift were to be accomplished in a single stage. This may be best understood by reference to FIG. 3, a block diagram of a 3-bit left shifter operating on an input string of 16-bits. The two-bit binary encoded shift input signal is applied to gate decoder 35 that decodes the 4-state input signal and activates one out of four output lines 47. Each of the four output lines is applied to 16 shift cell units 45 as an input to one of four corresponding two-input AND-gates 37. The other input to each of the AND-gates 37 was provided from the input resultant data bit string, [A.sub.15 -A.sub.0 ], as shown. For example, line 0, corresponding to a zero shift is connected to the lower set of AND-gates 37 so that when line 0 is active, outputs 15-0 correspond to [A.sub.15 -A.sub.0 ]. Similarly, line 1, corresponding to a one-bit left shift is applied to the set of gates having [A.sub.14 -A.sub.-1 ] connected as inputs causing the input string [A.sub.14 -A.sub.-1 ] to appear at outputs 15-0. Activating line 2 or 3 would cause input string [A.sub.13 -A.sub.-2 ] or [A.sub.12 -A.sub.-3 ] to appear at output 15-0. Because decoder 35 selects one out of four output lines, 4-input OR-gates 39 provide the multiplexing required to feed the selected bits to the output terminals.
If input bits [A.sub.-1 -A.sub.-3 ] are connected to [A.sub.15 -A.sub.13 ] respectively, the shift operation produces and end-around rotation or barrel shift of the input string [A.sub.15 -A.sub.0 ]; if [A.sub.-1 -A.sub.-3 ] are forced inactive, zeros would be appended to the shifted string.
Because floating-point processors accommodate mantissas with 64-bits (or more), extending the structure of FIG. 3 to 64-bits shifts would require 64 shift cell units 45, each having 64 AND-gates (or a total of 4096 AND-gates) together with the accompanying interconnect and control complexity. Consequently, the shift process is usually done in two stages as shown in FIG. 1.
The first stage shifter 30 may typically accommodate shifts of 0-bits while the second stage 40 has 8 shift cells and provides shifts of 0, 8, 16, 24, . . . 56. A similar cell structure to that described above is used except that each AND-gate accommodates 8 bundles of 8 input data bits. For a discussion of these techniques, see "Introduction to Arithmetic for Digital Systems Designers," Waser, S., and Flynn, M. J., Holt, Rinchat and Winston, 1982, pp. 106-123.
Referring back to FIG. 1, it should be noted that shift units 30 and 40 are designed to accept binary coded shift instruction from LZE 20. As shown, the k-bit shift instruction is split into an l-bit and m-bit field where l corresponds to the group of lower order bits while m corresponds to the higher order bits. Decoder 31 and 41 of units 30 and 40 respectively decode this information to activate 1 out-of 2.sup.l or 1 out-of 2.sup.m gate control lines. Similar decoding of k-bits occurs in zero mask unit 50.
FIG. 4 is a truth table for a 32-bit input LZE 20 unit. The input data bits are numbered [31-0] along the top. The output binary shift count of leading zeros is enumerated vertically along the right side. Also, an additional input control bit, E.sub.i, is shown at the extreme left. E.sub.i is an enable control bit that, together with E.sub.0, is used to cascade standard modular LZE networks to accommodate longer bit strings.
Each horizontal line of the table shows the position of the leading 1 and the resulting leading zero count. The symbol X following the leading one indicates that the value may be arbitrarily 0 or 1. Also, note that if all input bit patterns are zero, E.sub.0 is made active while the shift count is made zero (z=0) indicating a zero valued input. If E.sub.i =0, the output is disabled.
FIG. 5 shows a programmable logic array (PLA) implementation of a 32-bit input LZE 20. Data input bits [A.sub.31 -A.sub.0 ] and their complements [A.sub.31 - A.sub.0 ] are applied to the multi-input AND-gates 71 as indicated by the x-marks on the horizontal lines feeding the gates. The complements are generated by inverting buffers 75. Ei is an enabling input that is used to cascade modular LZE units. The AND-gate outputs P.sub.0 -P.sub.31 are individually and exclusively activated (1 out of 32) in accordance with the following boolean expressions: ##EQU1##
These expressions, as implemented in FIG. 5 ensure that a particular AND-gate 71 output, P.sub.k, is active only if the network in enabled (E.sub.i =1) and Ak is active while all An, n&gt;k, are inactive.
OR-gates 73 logically combine selected combinations of AND-gates 71 as indicated by the X-mark on the OR-gate input lines. The output bits, [C4-C0] are supplied by tri-state buffers 77 which are controlled by the output of OR-gate 79. If the network is enabled (Ei=1) and all input bits [A31-A0] are low AND-gate output Z1 is active indicating all input bits are zero. Also, AND-gate output Z2 is active if E1=0. Either of these two conditions will float output buffers 77. This latter feature is useful for cascading two (or more) LZE units as shown in FIG. 6.
Two LZE 70 units may be connected to accommodate a 64-bit input data string. LZE #1 accepts bits [A63-A32] while LZE #2 accepts bits [A31-A0]. Enable output, E0, from LZE #1 enables LZE #2 when all higher order bits are zero and also asserts output bit C5 indicating that none of the 32-higher order bits active high. The encoded output bits [C4-C0] of LZE #2, wired-OR with the corresponding disabled outputs of LZE #1, provided the five lower order bits. LZE #2 provides E0 active to indicate that all 64 bits are low.
Some earlier implementations of LZE units used 8-bit priority encoder ICs. FIG. 7 shows the logic diagram of an 8-bit priority encoder such as the Motorola 74LS148 or the logically equivalent Natural Semiconductor CMOS 74HC148. This chip has 8-input lines A.sub.7 -A.sub.0 and produces at its outputs C.sub.2 -C.sub.0 . Also, input enable, E.sub.i, output enable, E.sub.0, and G.sub.s, the complemented zero indicator is provided. FIG. 8, the corresponding truth table, clearly shows that these 8-bit priority encoders are the complemented equivalent of the LZE described above.
Cascading of these units may be accomplished as indicated in FIG. 9 and extended in multiples of 8-bits.