Computer systems or processors may include an arithmetic and logic unit (ALU) which is used for performing arithmetic and logical operations on data. In general, the ALU may be configured to execute operations such as addition/subtraction, multiplication on integer data, as well as various other logical operations, data movement operations, etc. Some processors may include a specialized floating point unit for handling floating point operations on floating point numbers. Depending on particular implementations, the floating point unit may reside within the ALU or as a separate unit.
Operations such as division and root computations (e.g., square root) are challenging to implement because they may involve several iterations, which may involve long latencies. Particularly, in the case of integers, division and root computation involves expensive shifting operations in each iteration. To explain, integers are conventionally represented with a varying number of leading sign bits, which makes it difficult to know where a leading bit of the quotient or result of the division or root computation will be. For example, an integer represented by 32-bits, may not have its leading bit appear in the most significant bit (MSB) or leftmost bit position of the 32-bits. Rather, the integer value itself may only require a few bits (less than 32-bits) which occupy the rightmost or least significant positions and the remaining bits of the 32-bits are be padded with sign bits. The sign bits may be “0” or “1,” based on whether the integer is positive or negative. Since in the case of a division, for example, inputs such as an integer dividend and an integer divisor, may have different and varying numbers of leading sign bits, it is not possible to easily determine the position of the quotient's leading bit. Since the position of the quotient's leading bit is not known, conventional integer dividers are not capable of building the quotient of the division from left to right. Therefore the quotient or result is built with the most significant bit (MSB) starting in the rightmost position and by shifting in less significant bits as they are formed in each iteration. This involves an expensive left-shift on each iteration.
On the other hand, such a left-shift on each iteration is not required for floating point division and root computation of normalized floating point numbers. Generally speaking, a normalized binary floating point number has the form, (1.mmm . . . )×2e, where the number “1.mmm . . . ” is referred to as a significand and the number “e” is an exponent. The floating point number is said to be normalized when the leading bit or most significant bit (MSB) of the significand is “1” and the binary point follows this most significant bit. In this representation, the MSB “1” can be implied and the bits “mmm . . . ” appearing after the binary point can be explicitly stored, and are referred to as a “mantissa.” In addition, the floating point number can have a sign (positive/negative), which is represented by a sign bit. In the IEEE 754 binary floating point representation, for example, a normalized single precision floating point number will be represented with 32-bits, where the sign bit is 1-bit wide, the mantissa is 23-bits wide, which provides a 24-bit significand when the implied leading “1” is added, and the exponent is 8-bits wide.
A floating point divider, for example, which has normalized inputs (e.g., a normalized floating point dividend and a normalized floating point divisor) can perform the division in an iterative manner (e.g., using algorithms such as the well-known Sweeney, Robertson, and Tocher (SRT) algorithm) to generate the quotient from left to right without requiring a left-shift in each iteration. This is because the location of the quotient's binary point or its leading “1” is known based on the exponent of the quotient (the quotient of the exponent is available by simply subtracting the exponent of the divisor from the exponent of the dividend, since the divisor and dividend are both in a normalized format.) However, since the quotient may not be in a normalized format, a normalizing shift may still be required to bring the quotient into a normalized format after the final iteration.
Accordingly, there is a need for avoiding the expensive shift operations and related drawbacks seen in conventional implementations of division and root computation in processors.