1. Field of the Invention
This invention relates in general to the field of microelectronics, and more particularly to a technique for incorporating the specification of floating point format at the instruction level into an existing microprocessor instruction set architecture.
2. Description of the Related Art
Since microprocessors were fielded in the early 1970's, their use has grown exponentially. Originally applied in the scientific and technical fields, microprocessor use has moved over time from those specialty fields into commercial consumer fields that include products such as desktop and laptop computers, video game controllers, and many other common household and business devices.
Along with this explosive growth in use, the art has experienced a corresponding technology pull that is characterized by an escalating demand for increased speed, expanded addressing capabilities, faster memory accesses, larger operand size, more types of general purpose operations (e.g., floating point, single-instruction multiple data (SIMD), conditional moves, etc.), and added special purpose operations (e.g., digital signal processing functions and other multi-media operations). This technology pull has resulted in an incredible number of advances in the art which have been incorporated in microprocessor designs such as extensive pipelining, super-scalar architectures, cache structures, out-of-order processing, burst access mechanisms, branch prediction, and speculative execution. Quite frankly, a present day microprocessor is an amazingly complex and capable machine in comparison to its 30-year-old predecessors.
But unlike many other products, there is another very important factor that has constrained, and continues to constrain, the evolution of microprocessor architecture. This factor—legacy compatibility—accounts for much of the complexity that is present in a modem microprocessor. For market-driven reasons, many producers have opted to retain all of the capabilities that are required to insure compatibility with older, so-called legacy application programs as new designs are provided which incorporate new architectural features.
Nowhere has this legacy compatibility burden been more noticeable than in the development history of x86-compatible microprocessors. It is well known that a present day virtual-mode, 32-/16-bit x86 microprocessor is still capable of executing 8-bit, real-mode, application programs which were produced during the 1980's. And those skilled in the art will also acknowledge that a significant amount of corresponding architectural “baggage” is carried along in the x86 architecture for the sole purpose of retaining compatibility with legacy applications and operating modes. Yet while in the past developers have been able to incorporate newly developed architectural features into existing instruction set architectures, the means whereby use of these features is enabled—programmable instructions—have become scarce. More specifically, there are no “spare” instructions in certain instruction sets of interest that provide designers with a way to incorporate newer features into an existing architecture.
In the x86 instruction set architecture, for example, there are no remaining undefined 1-byte opcode values. All 256 opcode values in the primary 1-byte x86 opcode map are taken up with existing instructions. As a result, x86 microprocessor designers today must choose either to provide new features or to retain legacy compatibility. If new programmable features are to be provided, then they must be assigned to opcode values in order for programmers to exercise those features. And if spare opcode values do not remain in an existing instruction set architecture, then some of the existing opcode values must be redefined to provide for specification of the new features. Thus, legacy compatibility is sacrificed in order to make way for new feature growth.
There are a number of features that programmers desire in a present day microprocessor, but which have heretofore been precluded from incorporation because of the aforementioned reasons. One particular feature that is desirable for incorporation is floating point format specification at the instruction level.
Accordingly, the present inventors have observed a need to provide programmers with the capability to specify, at the instruction level, the precision and/or rounding mode that is to be employed during execution of a floating point operation that is prescribed by a corresponding instruction. But, as one skilled in the art will appreciate, present day microprocessor architectures do not provide for such specification. Rather, the architectures typically include a floating point unit that performs floating point operations, and the precision and rounding mode that are employed by the floating point unit during execution of the floating point operations is prescribed within one or more associated hardware registers prior to execution of the instructions that prescribe the floating point operations. Within an x86-compatible microprocessor, these associated hardware registers are collectively called the floating point control word. Thus, the floating point format (i.e., precision and rounding mode) for all subsequent operations that are performed by a floating point unit within the x86-compatible microprocessor is specified by the values of various fields within the floating point control word. In the x86 architecture, a special instruction, FLDCW, must be executed in a program flow in order to change the precision and/or rounding mode of the floating point unit.
Specification of the precision and/or rounding mode for floating point operands and results is vital to the accurate implementation of floating point algorithms because floating point operations are inexact. Thus, it is necessary to provide consistency rules within these algorithms to insure correct results. For instance, the x86 floating point control word can be programmed to specify, say, single-precision operations with a rounding mode prescribing that a rounded result is closest to but not less than the infinitly precise result. This particular floating point format as specified within a given floating point control word may indeed suffice for some floating point algorithms, but it is entirely insufficient for other algorithms which may require a different precision or rounding mode. In fact, one skilled in the art will appreciate that the programming language JAVA often strictly requires the use of single-precision operands. Furthermore, one skilled will also appreciate that present day compilers typically set floating point control words to specify double-precision as the default precision for performing floating point operations. Moreover, one skilled will appreciate that although a typical instruction set architecture will provide an instruction (e.g., FLDCW) that directs a microprocessor to load a new floating point control word from memory in order to change the floating point format, the execution speed of this instruction is excruciatingly slow. This is because all operations within a microprocessor must be synchronized prior to changing the floating point control word. In practice, synchronization of the operations in the microprocessor essentially means that the microprocessor must be stopped, the floating point control word loaded from memory, and the microprocessor restarted. It follows then that the performance of such an operation results in a serious performance bottleneck—even in the presence of a single floating point format change. In fact, the present inventors have noted that many JAVA compilers entirely circumvent this performance bottleneck by employing an indirect—albeit substantially faster—technique to specify a new floating point format. That is, if single-precision operations are required to be performed within a floating point unit whose format is set for double-precision operations, then the compilers emulate the single-precision operations by allowing the floating point unit to perform these operations in double-precision mode, and then the results of the operations are rounded to single-precision by storing the results to memory at the required precision and rounding mode (most instruction set architectures allow floating point precision and/or rounding mode to be expressly specified when executing memory load and store operations). Finally, the results are loaded back from memory (at the desired precision) into the floating point unit for subsequent operations.
A description of the above-noted floating point format specification “workaround” is described in the paper entitled “Optimizing Precision Overhead for x86 Processors,” which is taken from “Proceedings of the 2nd Java™ Virtual Machine Research and Technology Symposium,” Aug. 1-2, 2002, Usenix: San Francisco, by Takeshi Ogasawara et al. And, as one skilled in the art will appreciate, although writing a floating point result out to memory and then reading it back into a floating point unit is not as slow as executing an instruction to load a new floating point control word, such an approach still results in a performance bottleneck.
Thus, the present inventors have noted a need to provide an improved technique for the specification of the floating point format to be used in a floating point operation that does not require synchronization of operations and that does not result in degraded performance as described above.
Therefore, what is desired is to have a plurality of loaded floating point control words that can be set to specify a plurality of desired floating point formats, and to enable a programmer to select one of these floating point control words for use in the operation specified by an associated floating point instruction, where the associated floating point instruction itself prescribes selection of the one of the floating point control words. It is also desirable to enable a programmer to directly specify the floating point format for an associated floating point operation directly, that is, both a floating point operation and the floating point format to be employed during execution of the operation are specified within a single instruction. Yet, although these needs have been noted, many instruction set architectures (including the current x86 instruction set architecture) have no means available to provide the desired features without sacrificing operability of some currently used opcodes.
Accordingly, what is needed is an apparatus and method that incorporate specification of floating point format features into an existing microprocessor architecture having a completely full opcode set, where incorporation of the floating point format specification features allow a conforming microprocessor to retain the capability to execute legacy application programs while concurrently providing application programmers and/or compilers with the capability to control specification of both floating point format and associated floating point operations at the instruction level.