1. Field of the Invention
This invention relates in general to the field of data processing in computers, and more particularly to an apparatus and method for loading single-precision operands into floating point registers during execution of a single load instruction.
2. Description of the Related Art
Software programs that execute on a microprocessor consist of macro instructions that together direct the microprocessor to perform a function. Each macro instruction directs the microprocessor to perform a specific operation that is part of the function such as loading data from memory, storing data in a register, or adding the contents of two registers.
A macro instruction may prescribe a simple operation, such as moving the contents of one register location to another register location. In contrast, it may prescribe a complex operation, such as deriving the cosine of a floating point number. Compared to the manipulation of integer data, the manipulation of floating point data by the microprocessor is complex and time consuming. Movement of integer data requires only a few cycles of a microprocessor clock; derivation of a cosine requires hundreds of machine cycles. Because floating point operations are basically more complex than integer operations, conventional microprocessors employ a dedicated floating point unit to improve the speed and efficiency of floating point calculations. The dedicated floating point unit may be part of the same mechanical package as the remainder of the microprocessor or it may reside in a separate mechanical package.
Within an x86-compatible microprocessor, a floating point macro instruction is decoded into a sequence of floating point micro instructions that direct the microprocessor to execute a floating point operation. The sequence of floating point micro instructions is passed to the floating point unit. The floating point unit executes the sequence of floating point micro instructions and provides a result of the floating point operation in a result register. Likewise, an integer macro instruction is decoded into a sequence of integer micro instructions that direct the microprocessor to execute an integer operation. The sequence of integer micro instructions is passed to the integer unit. The integer unit executes the sequence of integer micro instructions and provides a result of the integer operation in a result register.
Historically, the architecture of x86-compatible microprocessors has been such that integer unit logic is used to perform memory accesses, to include loading floating point operands from memory into registers in the floating point unit. The address in memory of an individual floating point operand is specified according to specific x86 addressing conventions. The floating point operands are retrieved from memory and are provided to the floating point unit over a write back bus. But, x86 instruction set architecture only provides the capability to load one floating point operand at a time. To load 10,000 floating point operands requires execution of 10,000 load instructions, which essentially equates to 10,000 instruction cycles in a conventional microprocessor. The format of a floating point operand to be loaded is prescribed by the load instruction. It can be single-precision (32 bits in length), double-precision (64 bits), or extended-precision (80 bits). Thus, present day microprocessors provide the capability to load an 80-bit floating data block from memory into a floating point register during execution of a single instruction, but they restrict the number of floating point operands to one that can be loaded in a single instruction cycle.
Such restriction has not been heretofore problematic, because floating point applications have primarily comprised scientific and financial routines which have not had execution time constraints. In other words, the time required to load operands from memory did not have a negative impact on most of the floating point applications of years past. But, with the proliferation of graphics applications in more recent years, the time required to load operands from memory has become an area of concern.
Graphics applications are unique in the sense that they typically perform simple floating point operations on lots of operands in a limited period of time. These operands represent various attributes of an image on a video monitor. As such, the requirements for precision normally accorded to more conventional floating point applications do not apply. In fact, no more than single-precision operands are required for most graphics applications. Moreover, because images viewed by the human eye are subject to human factors considerations, the speed with which an image is processed for display on a video monitor is of critical importance to a designer. And the time required to load the thousands of operands representing that image has become a bottleneck in many applications.
Therefore, what is needed is a microprocessor that loads floating point operands from memory much faster than has previously been provided.
In addition, what is needed is a microprocessor that can prescribe the address of a data block comprising two single-precision operands according to x86 addressing conventions, and load the single-precision operands into two prescribed floating point registers during a single instruction cycle.
Furthermore, what is needed is a method for concurrently loading two adjacent single-precision operands in a microprocessor that eliminates unnecessary instruction cycles associated with the calculation of an address associated with one of the two operands.