The technology of floating point execution units (processors) has undergone tremendous improvement in the preceding ten years. This evolution coincides with the growth of reduced instruction set computing (RISC) data processing architectures. An example of a contemporary floating point processor architecture appears in the RISC System/6000 workstation as commercially sold by IBM Corporation. Many related floating point processor concepts are described in U.S. Pat. No. 4,969,118, the subject matter of which is incorporated herein by reference.
Whereas prior floating point processors expended 30-50 clock cycles to complete a floating point mathematical operation, contemporary RISC designs perform the same mathematical operation, with mantissa bit counts of 32 or greater, in five clock cycles or less.
Normalization is the removal of leading zeros or ones for respective positive or negative outputs of the full adder conventionally used to perform the floating point operations. The determination of how many leading zeros or leading ones need to be removed is preferably accomplished in parallel with the operations in the full adder. Such concurrency is important since even a few clock cycles now have a major performance impact on the composite speed of the floating point processor.
In pursuit of this need for parallel operation, leading 0/1 "detector" devices formerly used with floating point processors were replaced by leading 0/1 "anticipator" architectures and circuits. An example of such leading 0/1 anticipator architecture appears in U.S. Pat. No. 4,926,369, the subject matter of which is incorporated herein by reference, and is further developed in the article by Hokenek, et al entitled "Second-Generation RISC Floating Point with Multiply-Add Fused" as published in the IEEE Journal of Solid-State Circuits, Volume 25, No. 5, October, 1990 at pages 1207-1213. The objective of the leading 0/1 anticipator (LZA) is to minimize the normalization delay following an output from the full adder. The ideal situation is to have available for immediate use the leading 0/1 shift adjustment when the mantissa value becomes available from the full adder. This allows immediate normalization of the mantissa by shifts to the left to remove, as appropriate, either the leading zeros or the leading ones depending upon the sign of the adder output.
Although the architecture described in U.S. Pat. No. 4,926,369 proved to be a tremendous improvement over the then existing designs, that architecture required the generation, and the eventual logical combination, of five separate strings of variables (state outputs) from the two input data bit strings presented to the full adder. The five bit strings representing the states are designated ZZ, PP, PZ, PG and GG. The generation of the logical combinations PZ and PG require both logical AND and logical OR stages, extending the number of successive gates and precluding the use of single rail high speed logic. Furthermore, the presence of five strings of variables require that each bit in a string be a function of all higher order bits (all bits to the left of it), create the needs for logic circuits with significant FAN-IN and FAN-OUT loading. As a consequence, the logic needed to implement the five state design was not readily amenable to reductions in size or increases in speed.
The leading 0/1 anticipation architectures and circuits as defined in such teachings were a significant improvement over their predecessors but have proven to be slow and complex in relation to the needs of contemporary floating point processors.