1. Field of the Invention
The present invention generally relates to computer systems, more specifically to a method of determining the number of leading zeros (or ones) in a binary value for computational processing, and particularly for providing an encoded leading-zero count augmented by a constant bias value.
2. Description of Related Art
The basic structure of a conventional computer system includes a central processing unit (CPU) or processor which is connected to several peripheral devices, including input/output (I/O) devices such as a display monitor and keyboard for the user interface, a permanent memory device (such as a hard disk or floppy diskette) for storing the computer""s operating system and user programs, and a temporary memory device (such as random-access memory or RAM) that is used by the processor to carry out program instructions. A processor communicates with the peripheral devices by various means, including a bus or a direct channel. A computer system may have many additional components such as serial and parallel ports for connection to, e.g., modems or printers. Those skilled in the art will further appreciate that there are other components that might be used in conjunction with the foregoing; for example, a display adapter connected to the processor might be used to control a video display monitor, and a memory controller may be used as an interface between the temporary memory device and the processor.
A typical processor configuration is shown in FIG. 1. Processor 1 includes a bus interface unit 2 which controls the flow of data between processor 1 and the remainder of the data-processing system (not shown). Bus interface unit 2 is connected to both a data cache 3 and an instruction cache 4. Instruction cache 4 supplies instructions to branch unit 5, which determines what sequence of instructions is appropriate given the contents of general-purpose registers (GPRs) 6 and floating-point registers (FPRs) 7 in processor 1, the availability of load/store unit 8, fixed-point execution unit 9, and floating-point execution unit 10, and the nature of the instructions themselves. Branch unit 5 forwards the ordered instructions to dispatch unit 11, which issues the individual instructions to the appropriate execution unit (load/store unit 8, fixed-point execution unit 9, or floating-point execution unit 10).
Fixed-point execution unit 9 reads data from and writes data to general-purpose registers 6. Floating-point execution unit 10 reads data from and writes data to floating-point registers 7. Load/store unit 8 reads data from general-purpose registers 6, or floating-point registers 7, and writes the data to data cache 3 or to an external memory (not shown) depending on the memory hierarchy and caching protocol employed by the data-processing system, which are beyond the scope of the present invention. Load/store unit 8 also reads data from data cache 3 and writes the data to general-purpose registers 6 and floating-point registers 7.
A processor can perform arithmetic operations on different types of numbers, or operands. For example, the simplest operations involve integer operands, which are represented using a xe2x80x9cfixed-pointxe2x80x9d notation. Non-integers are typically represented according to a xe2x80x9cfloating-pointxe2x80x9d notation. Standard number 754 of the Institute of Electrical and Electronics Engineers (IEEE) sets forth particular formats which are used in most modern computers for floating-point operations. For example, a xe2x80x9csingle-precisionxe2x80x9d floating-point number is represented using a 32-bit (one word) field, and a xe2x80x9cdouble-precisionxe2x80x9d floating-point number is represented using a 64-bit (two-word) field. Most processors handle floating-point operations with a floating-point unit (FPU).
Floating-point notation (which is also referred to as exponential notation), can be used to represent both very large and very small numbers. A floating-point notation has three parts, a mantissa (or significand), an exponent, and a sign (positive or negative). The mantissa specifies the digits of the number, and the exponent specifies the magnitude of the number, i.e., the power of the base which is to be multiplied with the mantissa to generate the number. For example, using base 10, the number 28330000 would be represented as 2833E+4, and the number 0.054565 would be represented as 54565E-6. Since processors use binary values, floating-point numbers in computers use 2 as a base (radix). Thus, a floating-point number may generally be expressed in binary terms according to the form
n=(xe2x88x921)sxc3x971.Fxc3x972E,
where n is the floating-point number (in base 10), S is the sign of the number (0 for positive or 1 for negative), F is the fractional component of the mantissa (in base 2), and E is the exponent of the radix. In accordance with IEEE standard 754, a single-precision floating-point number uses the 32 bits as follows: the first bit indicates the sign (S), the next eight bits indicate the exponent offset by a bias amount of 127 (E+bias), and the last 23 bits indicate the fraction (F). So, for example, the decimal number ten would be represented by the 32-bit value
0 10000010 01000000000000000000000
as this corresponds to (xe2x88x921)0xc3x971.012xc3x972130-127=1.25xc3x9723=10.
When a value is expressed in accordance with the foregoing convention, it is said to be normalized, that is, the leading bit in the significand is nonzero, or a xe2x80x9c1xe2x80x9d in the case of a binary value (as in xe2x80x9c1.Fxe2x80x9d). If the explicit or implicit most significant bit is zero (as in xe2x80x9c0.Fxe2x80x9d), then the number is said to be unnormalized. Unnormalized numbers can easily occur as an output result of a floating-point operation, such as the effective subtraction of one number from another number that is only slightly different in value. The fraction is shifted left (leading zeros are removed from the fraction) and the exponent adjusted accordingly; if the exponent is greater than or equal to Emin (the minimum exponent value), then the result is said to be normalized. If the exponent is less than Emin, an underflow has occurred. If the underflow is disabled, the fraction is shifted right (zeros inserted) until the exponent is equal to Emin. The exponent is replaced with xe2x80x9c000xe2x80x9d (hexadecimal), and the result is said to be denormalized. For example, two numbers (having the same small exponent E) may have mantissas of 1.010101 and 1.010010, and when the latter number is subtracted from the former, the result is 0.000011, an unnormalized number. If E less than 5, the final result will be a denormalized number.
The hardware of many conventional computers is adapted to process only normalized numbers. Therefore, when a denormalized number is presented as an output result of a floating-point operation, it must be normalized before further processing of the number can take place. Various techniques are used to normalize the values, generally by removing leading zeros from the fraction and accordingly decrementing the exponent. See U.S. Pat. No. 5,513,362. One technique involves leading zero anticipator (LZA) logic which predicts the number of zeros to remove before the floating-point arithmetic is completed. See IBM Journal of Research and Development, vol. 34, no. 1 (January 1990), pp. 71-77.
Referring to FIG. 2, a high-level block diagram of a conventional construction for floating-point execution unit 10 is illustrated. Floating-point execution unit 10 includes three inputs 202, 204, and 206 for receiving input operands A, B, and C, respectively, expressed as floating-point numbers. Floating-point execution unit 10 uses these operands to perform a xe2x80x9cmultiply-addxe2x80x9d instruction. The multiply-add instruction executes the arithmetic operation xc2x1[(Axc3x97C)xc2x1B]. The exponent portions of operands A, B, and C received at inputs 202, 204, and 206 are provided to an exponent calculator 208. The mantissa portions of operands A and C are provided to a multiplier 212, while the mantissa portion of operand B is provided to an alignment shifter 214. As used herein, the term xe2x80x9caddingxe2x80x9d inherently includes subtraction since the B operand can be a negative number.
Multiplier 212 receives the mantissas of operands A and C and reduces the arithmetic function (Axc3x97C) to two intermediate results, known as xe2x80x9csumxe2x80x9d and xe2x80x9ccarry.xe2x80x9d These intermediate results are provided to a main adder/incrementer 222. Exponent calculator 208 calculates an intermediate exponent from the sum of the exponents of operands A and C and stores the intermediate exponent in an intermediate exponent register 224. Exponent calculator 208 also calculates the difference between the intermediate exponent and the exponent of operand B, and decodes that value to provide control signals to both a leading zero anticipator (LZA) 226 and alignment shifter 214. Alignment shifter 214 shifts the mantissa of operand B so that the exponent of operand B, adjusted to correspond to the shifted mantissa, equals the intermediate exponent. The shifted mantissa of operand B is then provided to main adder/incrementer 222. Main adder/incrementer 222 adds the shifted mantissa of operand B to the sum and carry results of multiplier 212. The output of main adder/incrementer 222 is stored in an intermediate result register 228.
Simultaneously with the mantissa addition in main adder/incrementer 222, LZA 226 predicts the position of the leading one in the result. Since the nature of the arithmetic operationxe2x80x94logical addition or logical subtractionxe2x80x94is known well in advance, LZA 226 may predict the location of the leading one in the result mantissa as being in one of two adjacent bit positions. The left bit position, the most significant bit of the pair, is referred to as the xe2x80x9cminimum positionxe2x80x9d as it represents the minimum shift required for normalization of the result mantissa. Similarly the right bit position, representing the maximum shift required for normalization, is referred to as the xe2x80x9cmaximum position.xe2x80x9d For example, if twelve zeroes were predicted to precede the centerpoint of the minimum/maximum bit position pair, the shift amount pair would be either (11,12) for logical addition or (12,13) for logical subtraction. Because the minimum-predicted shift amount must always be selected to ensure that a leading one is not removed from the result, the shift amount used is always based on an encoding of the minimum position of the. predicted bit position pair.
LZA 226 computes a normalize adjust based on the minimum bit position, which is stored in a normalize adjust register 230. The normalize adjust from normalize adjust register 230 is provided, together with the intermediate result mantissa from intermediate result register 228, to a normalizer 232. Normalizer 232 performs the shifting required to place the leading one in the most significant bit position of the result mantissa. The shifted mantissa is then provided to a rounder 234, which rounds-off the result mantissa to the appropriate number of bits.
The normalize adjust from normalize adjust register 230 is also provided to an exponent adder 236. To obtain the proper exponent, the exponent is initially adjusted to correct for the maximum shift predicted by leading zero anticipator 226. If the final result of main adder/incrementer 222 requires only the minimum shift, a late xe2x80x9ccarry-inxe2x80x9d to the exponent adder corrects for the minimum shift amount. To adjust the exponent for the maximum shift predicted, the two""s complement of the maximum bit position is added to the intermediate exponent. The addition of the exponent adjust to the intermediate exponent may be initiated as soon as the exponent adjust is available from leading zero anticipator 226, which will typically be before the result from main adder/incrementer 222 becomes available.
The final result mantissa from rounder 234 is combined with the final exponent from exponent adder 236 and forwarded, at output 238, to a result bus (not shown) of floating-point execution unit 10. From the floating-point execution unit""s issue multiplexer, the normalized floating-point result may be directly written to a floating-point register or, alternatively, to a designated entry in a rename buffer. In this particular unit, a leading zero overlay (LZO) is generated by logic unit 231, that may prevent the LZA from requesting full normalization. The LZO is based on the intermediate exponent stored in intermediate exponent register 224. See U.S. Pat. No. 5,943,249 for further details.
Determination of leading zeros for binary vectors of relatively short length (e.g. 4 bits long) can usually be accomplished using a Karnaugh map, or other relatively simple Boolean logic. As the binary data field for which this function becomes longer, however (e.g. 32, 64, or 128 bits long), the function can no longer be performed easily in this fashion. The use of two separate functional blocks operating in series (the binary leading-zero counters followed by binary adders to realize the biased count result) requires additional power and integrated circuit area. The difficulty can be compounded in floating-point arithmetic wherein it is necessary to re-normalize the mantissa (shift left to remove all leading zeros). It would, therefore, be desirable to devise an improved method of determining a leading-zero count which used decreased integrated circuit area and power consumption. It would be further advantageous if the method were amenable to high-speed processing, such as when the processor operates at speeds of one gigahertz or more.
It is therefore one object of the present invention to provide an improved processor for a computer system.
It is another object of the present invention to provide such a processor which performs a leading zero determination in a more efficient manner.
It is yet another object of the present invention to provide an improved method for performing binary leading-zero counting with a constant-biased result.
The foregoing objects are achieved in a method of determining a leading-zero count of a binary value for a floating-point operation, generally comprising the steps of dividing a binary vector into a plurality of subvectors, generating a plurality of subvector leading-zero counts, one for each of the subvectors, and concatenating the subvector leading-zero counts to yield a final leading-zero count for the binary vector. The floating-point operation provides a result which may be shifted by an amount equal to the leading-zero count; for example, the result may be an intermediate mantissa of a floating-point multiply-add operation, and the shifting normalizes the intermediate mantissa. In the preferred implementation, the binary vector has a length of 2n, and each subvector has a length of 2m, where m is less than n, e.g., the binary vector has 64 bits, and each of the subvectors has 16 bits. The method may further divide each of the subvectors into a plurality of base fields, and generate a plurality of base field leading-zero counts as well. The method also preferably generates several signals, one for each given subvector, which designate whether all bits of a given subvector have a zero value. The concatenating step then uses the subvector leading-zero counts in combination with the signals to calculate a portion of the final leading-zero count. In particular, the concatenating step selects four low bits of the final leading-zero count from four low bits of a most significant subvector leading-zero count whose input data is non-zero. The method may be applied to generate subvector leading-zero counts, and a final leading-zero count, which are biased by a constant amount.
The above as well as additional objectives, features, and advantages of the present invention will become apparent in the following detailed written description.