Processor architectures and their arithmetic units, such as floating-point units, have generally evolved with a natural tension between whether to increase performance by implementing specialized instructions or whether to minimize the complexity of the underlying structures that otherwise would become too unwieldy to realize the benefits from implementing those specialized instructions. As such, processors are designed to implement instruction set architectures (“ISAs”) having a set of instructions that strikes an optimal balance between the above-described natural tensions. For example, some common processor architectures contain a global register to maintain rounding control information for rounding computational results rather than incorporating such information into its instructions. In usage, a floating-point unit typically stores rounding control information in the global register to perform various arithmetic operations, such as adding, subtracting and multiplying, as well as complex transcendental functions (e.g., sine and cosine functions). But note that the global register holds the rounding control information as global state information. As such, the rounding control information remains resident during execution of a number of instructions so that only one rounding mode applies.
A drawback to this approach is that when the flow control of executing instructions requires frequent changes between one rounding mode and another, the processor needs to update one or more rounding control bits in the global register (e.g., a control status word) during each change in flow control. Typically, this can delay processing by 40 clock cycles or more. Such performance penalties are commonplace during frequent calls to and returns from interrupts (e.g., from servicing subroutines or other executable functions, such as dynamic link library modules, or “DLLs”). Another drawback to this approach is that the processor architecture usually manages two rounding mode control (“RMC”) bits in the global register, thereby requiring the rounding control information to specify four rounding modes. For example, two RMC bits of “00” can specify a round-to-nearest rounding mode, two RMC bits of “01” can specify a round-to-negative infinity rounding mode, two RMC bits of “10” can specify a round-to-positive infinity rounding mode, and two RMC bits of “11” can specify a round-to-zero rounding mode. Yet another drawback is that the use of such global registers can lead to variability in the results during the execution one or more code portions (e.g., subroutines) in computer programs. In particular, each code portion can yield a different result for varying states (or settings) of the global register. So if different code portions are combined to form a computer program, and each depends on a specific state of the global register, then subsequently executed code portions generally will not interoperate properly with global register settings for previously executed code portions. This leads to variability in results. Consequently, it becomes necessary to either place the global register in a specific state for each code portion or know during code development the previous global register states, both of which adds inefficiencies to code development and execution.
A prime motivation to include the four rounding modes is that the Institute of Electrical and Electronics Engineers Standard for Binary Floating-Point Arithmetic (“IEEE Std 754-1985”) requires a processor to provide these four rounding modes for compliance to facilitate software portability onto differing hardware platforms. Round-to-negative infinity and round-to-positive infinity rounding modes are particularly important for traditional processors when performing interval arithmetic, which is used to estimate a possible range of values (i.e., an interval of real numbers) that a computation will produce, given the range of values of each of the input numbers that are to be arithmetically operated upon. Generally during interval computations, a conventional processor that is computing a lower bound rounds an interval endpoint toward negative infinity (“−INF”), while during an upper bound computation, it rounds the other interval endpoint toward positive infinity (“+INF”). Traditional processors use two rounding control bits to comply with IEEE 754-1985.
Lesser-known processor architectures encode two rounding mode control bits into instructions of its instruction set to provide conventional per-instruction rounding mode control. These two rounding control bits are usually located in a reserved portion of in an instruction (e.g., in a function field). One drawback to this scheme is that a control register, such as the above-described global register, is required to implement rounding to +INF for compliance with IEEE 754. This adds additional overhead than otherwise is needed if this approach operated independent of a value in the control register. Another drawback is that it encodes rounding to only one direction of infinity in the instruction, while relying on the control register to provide rounding to the other direction of infinity. As such, this approach adds at least two rounding mode control bits to the width of instruction word size (e.g., by increasing the size of the function field), without the benefit of being reliant on the control register. Further, the encoded rounding mode control bits reduce the number of bits available for performing other necessary functions.
FIG. 1 is a code snippet representing a conventional implementation of directed rounding with floating point operations. Code snippet 20 illustrates conventional techniques for implementing directed rounding as applied to interval arithmetic, and specifically, interval multiplication of two intervals. Namely, code snippet 20 multiplies intervals X=[Xl, Xu] and Y=[Yl, Yu] to form a product 10 as interval [Zl, Zu]. Traditionally, instructions similar to “mult.rtn” 30 are executed to multiply left and right endpoints of both intervals X and Y by first rounding down to −INF. In particular, the left endpoint of X (“Xl”) is multiplied with the left endpoint of Y (“Yl”) and endpoint Xl is multiplied by the right endpoint of Y (“Yu”). Code snippet 20 also multiplies endpoints Xu with Yl and endpoints Xu with Yu. This forms four intermediate products all of which are rounded down to −INF. Instructions like “mult.rtp” 40 perform similar multiplicative operations except these instructions round the intermediate products up to +INF. Finally, instructions “fmin” 50 and “fmax” 60 determine the minimum and maximum values, respectively, thereby producing the left endpoint (e.g., Zl) and the right endpoint (e.g., Zu) of the result. “.rtn” and “.rtp” denote “round to negative infinity” and “round to positive infinity,” respectively. Note that a drawback of implementing code snippet 20 to accomplish directed rounding for interval multiplication is that instructions 30 and 40 together include eight multiplication operations, each of which are relatively computation-intensive, thereby consuming computational resources that otherwise can be freed-up to perform other calculations.
In view of the foregoing, it would be desirable to provide a processor, an instruction set architecture, an instruction, a computer readable medium and a method that minimizes the above-mentioned drawbacks and provides for optimal per-instruction encoding of rounding control to facilitate emulation of directed rounding to a negative or a positive infinity.