1. Field of the Invention
The present invention relates to the field of data processing. In particular, the invention relates to an apparatus and method for rounding a floating-point value to an integral floating-point value.
2. Description of the Prior Art
Processors for performing arithmetic operations on floating-point numbers are known. In floating-point representation, numbers are represented using a significand 1.F, an exponent E and a sign bit S. The sign bit S represents whether the floating-point number is positive or negative, the significand 1.F represents the significant digits of the floating-point number, and the exponent E represents the position of the radix point (also known as a binary point) relative to the significand. By varying the value of the exponent, the radix point can “float” left and right within the significand. This means that for a predetermined number of bits, a floating-point representation can represent a wider range of numbers than a fixed point representation (in which the radix point has a fixed location within the significand). However, the extra range is achieved at the expense of reduced precision since some of the bits are used to store the exponent. Sometimes, a floating-point arithmetic operation generates a result with more significant bits than the number of bits used for the significand. If this happens then the result is rounded to a value that can be represented using the available number of significant bits.
FIG. 1 of the accompanying drawings shows how floating-point numbers are stored within a register or memory. In a single precision representation, 32 bits are used to store the floating-point number. One bit is used as the sign bit S, eight bits are used to store the exponent E, and 23 bits are used to store the fractional portion F of the significand 1.F. The 23 bits of the fractional portion F, together with an implied bit having a value of one, make up a 24-bit significand 1.F. The radix point is initially assumed to be placed between the implied bit and the 23 stored bits of the significand. The stored exponent E is biased by a fixed value 127 such that in the represented floating-point number the radix point is shifted left from its initial position by E−127 places if E−127 is negative (e.g. if E−127=−2 then a significand of 1.01 represents 0.0101), or right from its initial position by E−127 places if E−127 is positive (e.g. if E−127=2 then a significand of 1.01 represents 101). The bias is used to make it simpler to compare exponents of two floating-point values as then both negative and positive shifts of the radix point can be represented by a positive value of the stored exponent E. As shown in FIG. 1, the stored representation S[31], E[30:23], F[22:0] represents a number with the value (−1)S*1.F[22:0]*2(E−127). A single-precision floating-point number in this form is considered to be “normal”. If a calculated floating-point value is not normal (for example, it has been generated with the radix point at a position other than between the left-most two bits of the significand), then it is normalized by shifting the significand left or right and adjusting the exponent accordingly until the number is of the form (−1)S*1.F[22:0]*2E−127.
A double precision format is also provided in which the significand and exponent are represented using 64 stored bits. The 64 stored bits include one sign bit, an 11-bit exponent and the 52-bit fractional portion F of a 53-bit significand 1.F. In double precision format the exponent E is biased by a value of 1023. Thus, in the double precision format a stored representation S[63], E[62:52], F[51:0] represents a floating-point value (−1)S*1.F[51:0]*2E−1023.
In the present application, some examples will be explained with reference to the single precision floating-point format. However, it will be appreciated that the invention could also be applied to the double precision format (or any other floating-point format) and that the bit values shown in subsequent Figures could be replaced by values appropriate to the floating-point format being used.
One kind of floating-point operation is a round to integral floating-point operation (FRINT), an operation which rounds a floating-point value to an integral floating-point value. For example, a floating-point value of 6.75 can be rounded to one of the neighbouring integral values 6.0 or 7.0. The present technique seeks to reduce the latency associated with performing the FRINT operation.