1. Field of the Invention
The present invention relates to a data processing apparatus and method for normalizing a data value.
2. Description of the Prior Art
Within a data processing apparatus, it is often the case that a data value may need to be normalized, such normalization requiring the data value to be shifted by a number of bit positions to cause the leading 1 of the data value to be placed in the integer bit position, i.e. immediately to the left of the position at which the decimal point is considered to exist in the data value. A particular example where such normalization is often used is in the processing of floating point numbers.
A floating point number can be expressed as follows:±1.x*2y 
where: x=fraction                1.x=significand (also known as the mantissa)        y=exponent        
FIG. 1 is a diagram illustrating a known prior art technique for normalizing a data value, in this case a sum value output by the adder logic 10. In particular, the adder logic 10 is arranged to receive the significands of two floating point numbers to be added, referred to in FIG. 1 as the augend and the addend, and the resultant sum value is stored in the register 30.
In parallel with the addition performed by the adder logic 10, leading one prediction logic 20 (which can in alternative embodiments be replaced by leading zero anticipator logic) is used to predict the bit position of the most significant 1 in the sum, this being done on the basis of the input augend and addend. The leading one prediction logic 20 then outputs a shift count value indicating a predicted number of bit positions by which the sum value should be shifted in order to produce a normalized sum (i.e. a sum of the form 1.x). Considering as an example single precision floating point numbers, the input significands will be 24 bits in length, and in such embodiments the sum output by adder 10 will be 25 bits in length, and the shift count value output by the leading one prediction logic will be 5 bits.
The functions performed by the adder logic 10 and leading one prediction logic 20 will typically be implemented within a particular pipeline stage of a data processing apparatus, and at the end of that pipeline stage, the sum value produced by the adder logic 10 will be stored in the register 30 and the shift count value produced by the leading one prediction logic 20 will be stored in the register 40. The end of this pipeline stage is indicated by the horizontal line 70 in FIG. 1.
In the next pipeline stage, the sum value is output from the register 30 to normalization shift logic 50, which is arranged to perform a left shift of the sum value by a number of bit positions indicated by the shift count value read from the register 40. Assuming the input significands to the add operation are n bits in length, the sum value output by the adder logic 10 will be n+1 bits in length, as will the shifted sum value output by the normalization shift logic 50. Bits n to 1 are routed to a first input of multiplexer 60 (the left-hand side input as shown in FIG. 1), and bits n−1 to 0 are input to a second input of the multiplexer 60 (the right-hand side input as shown in FIG. 1).
The final required normalized result will be n bits in length, and if the shift count predicted by the leading one prediction logic 20 was correct, then the most significant bit output from the normalization shift logic 50 will be a logic zero value, and the correct normalized result will be given by bits n−1 to 0, i.e. the right-hand side input to multiplexer 60. However, it is possible that the shift count produced by the leading one prediction logic 20 may be one greater than required. This is due to the fact that the leading one prediction logic 20 predicts the shift count by evaluating the two input operands from left to right until a leading one candidate is found. If a carry exists into that bit position, the shift count predicted by the leading one prediction logic 20 will be one greater than required. This is detected by looking at the most significant bit of the output from the normalization shift logic 50. If the most significant bit is set, then the prediction performed by the leading one prediction logic 20 was incorrect and a single right shift is required to produce the correctly normalized result. This single right shift is effected by driving the multiplexer 60 to select as its output the left-hand side input, i.e. bits n to 1, to form the output n-bit normalized sum. Otherwise, if the most significant bit output by the normalization shift logic 50 is a logic 0 value, then the prediction performed by the leading one prediction logic 20 was correct, and the multiplexer 60 is driven to select as its output the right-hand side input, i.e. bits n−1 to 0 output by the normalization shift logic 50.
It should be noted from FIG. 1 that the output of the normalization shift logic 50 is used to drive the final multiplexer 60, creating a timing arc from the output of the normalization shift logic 50 through the mux control to the final normalized result. Hence, whilst the detection of an error in the leading 1 prediction is trivial, such detection must be buffered in order to drive the final multiplexer and such buffering and muxing time are added to the normalization time, which can result in an unacceptable delay in the generation of the normalized result.
An example of the technique illustrated in FIG. 1 can be seen in Suzuki et al., “Leading-Zero Anticipatory Logic for High-Speed Floating Point Addition” IEEE J. Solid State Circuits, Vol 31, No. 8, August 1996. A general survey of leading one and zero anticipation methods is the subject of Schmookler et al., “Leading Zero Anticipation and Detection—A Comparison of Methods” 15th Symposium on Computer Arithmetic, pp 7-12.
Recently, work has been done to seek to reduce the above timing problems associated with the prior art technique of FIG. 1. In accordance with the technique described in the article “Leading-One Prediction with Concurrent Position Correction” by Bruguera and Lang, IEEE Transactions on Computers, Vol 48, No. 10, October 1999, pp 1083-1097, a modified leading one predictor is described which, based on the input operands, is able to return either a corrected shift count or a signal indicating an error in the prediction. Such an approach is schematically illustrated in FIG. 2. As can be seen in FIG. 2, which illustrates another means of adding two significands, an adder 100 is used to produce a sum stored in register 120. In parallel, the modified leading one prediction logic 110 described in the above article is used to produce a shift count value for storing in the register 130, which can be ensured to be correct, due to the additional logic provided within the leading one prediction logic 110, which is referred to in FIG. 2 as CLOP (correction leading one prediction) logic 110. Since the output from the CLOP circuit 110 will be correct, it is only necessary in the subsequent pipeline stage to employ normalization shift logic 140 to shift the output from the adder 100 by a number of bit positions indicated by the shift count value stored in the register 130, with the output from the normalization shift logic 140 then being guaranteed to be the correct normalized result.
Whilst such an approach enables the removal of a final multiplexer after the normalization shift logic 50, it introduces a significant amount of extra logic to the leading one prediction, which makes its implementation very costly. In implementations where cost is a significant factor, and needs to be balanced against performance, it is likely that the additional cost involved in such an approach will be prohibitive.
An alternative prior art approach aimed at seeking to reduce the timing problems associated with the prior art of FIG. 1 involves using standard leading one prediction logic to produce a predicted shift count, and then providing correction logic to perform any required correction in parallel with the shift operation performed by the normalization shift logic. In particular, in accordance with the technique described in Hokenek et al., “Leading-Zero anticipator (LZA) in the IBM RISC System/6000 Floating-point execution unit” IBM J. Res. Develop. 34:71-77 (1990), and in Quach et al, “Leading One Prediction—Implementation, Generalization and Application”, Technical Report CSL-TR-91-463, Stanford Univ., 1991, and in U.S. Pat. No. 6,085,211, the output of the leading one predictor is modified with carry information from the adder to generate a correct prediction. This technique is schematically illustrated in FIG. 3. As can be seen in FIG. 3, which illustrates logic used to add two significands, adder 500 is used to produce a sum based on two input significands provided as the augend and addend, the sum being stored in register 520. The adder 500 also outputs a second value representing at each bit position in the sum the carry into that bit position, which is stored in register 530. In parallel, the leading one predictor 510 generates a prediction value representing the location of the bit position containing the predicted leading one bit, this prediction value being stored in register 540. This prediction may be incorrect by one bit position if the carry into that bit position, given by the carry output from adder 500, is set. The shift count correction logic 550 combines the prediction value of the leading one predictor 510 and the carry vector from adder 500 and generates a correct shift count value input to normalization shift logic 560. However, the extraction of the carry signals complicates the adder design, both in routing and timing. Hence, although this technique avoids the need to await the output from the normalization shift logic before deciding how to drive a final correction multiplexer, it significantly complicates the overall design.
US 2002/0165887A1 describes an alternative approach in which a leading one correction circuit is used to determine whether a final correction shift is required in order to correct for a 1-bit error in the shift count identified by the leading one prediction logic. In accordance with this technique, the leading one correction circuit receives both the sum produced by the adder logic, and a 1-hot vector produced by the leading one prediction logic, the 1-hot vector having a bit set to represent the predicted location of the leading one, the prediction being correct if the true leading one in the sum is in this location, and being incorrect if the true leading one in the sum is in the bit location immediately to the right of the predicted location. This 1-hot vector value is shifted left by one bit position and logically ANDed with the sum from the adder, and if the value resulting from the AND operation is one, the leading one prediction is determined to be incorrect and a correction step is required.
Whilst such an approach can avoid the need to wait for the output of the normalization shift logic to be produced before determining whether a correction shift is required, it requires modification of the leading one prediction logic to ensure that that leading one prediction logic produces not only the shift count value, but also the required 1-hot vector used by the leading one correction circuit. This 1-hot vector will be the width of the sum produced by the adder, and accordingly would be 24-bits in length for single precision floating point numbers and 53-bits in length for double precision floating point numbers. Hence, this increases the complexity of the leading one prediction logic. In addition, thereafter, the 1-hot prediction vector (left shifted by one bit position) and the sum from the adder are bit-wise ANDed and the outputs input to a reduction OR function. If the result is high, the prediction of the leading one prediction logic is incorrect and must be corrected. Hence, in summary, it can be seen that this technique also adds a significant amount of complexity to the design, which will increase the cost.
Accordingly, it would be desirable to provide an improved technique for normalizing a data value which allows a timing improvement to be achieved with respect to the earlier-described prior art of FIG. 1, but which is less complex than the known techniques.