Intel added a new set of related SSE instructions to their instruction set: ROUNDPD, ROUNDPS, ROUNDSD, and ROUNDSS, referred to collectively here as the ROUND instruction. The ROUND instruction rounds a floating point input value to an integer value and then returns the integer result as a floating point value. The rounding during the conversion from a floating point value to an integer value is performed based on a rounding control, or rounding mode.
These separate conversions from floating point to integer and conversion from integer to floating point operations are well understood in practice. The first operation requires locating an integer least significant bit (LSB) and binary round point within the source data value (with critical delay thru a right shifter) followed by conditional increment of a non-fractional value. The second operation potentially requires leading-zero enumeration followed by a normalization shift left and appropriate exponent calculation. It is necessary to decide how these two operations will be provided on a target floating point hardware design.
Prior multi-cycle or high latency designs provide the required capabilities in sequential circuit connections, first performing a right alignment shift, next a conditional round-increment, next leading zero enumeration, and finally a conditional normalization left shift. Significantly, if provided as maximally utilized rather than special purpose hardware, this approach penalizes any calculation not requiring some portion of the sequential connection with its intrinsic delay. If provided as special purpose hardware, this approach would consume valuable die space. These approaches are undesirable in a high performance microprocessor with emphasis on maximal utilization of circuit elements.
Other low latency floating point designs attempt separation of constituent circuit elements into minimal groups required for classes of calculation, such as near versus far calculations. The specific characteristics of each class allow reduction of total latency per calculation by eliminating unnecessary circuit components. For example, near subtract calculations may have trivial right alignment requirements. These types of design may afford the capabilities required for the new ROUND with a temporally sequential approach, namely, by scheduling the convert to integer using one group and the subsequent convert to floating point using a separate group.