1. Field of the Invention
This invention relates in general to the field of instruction execution in computers, and more particularly to an apparatus in a pipeline microprocessor for accomplishing a register exchange operation in a floating point register stack.
2. Description of the Related Art
A microprocessor has an instruction pipeline that sequentially executes instructions from an application program in synchronization with a microprocessor clock. The instruction pipeline is divided into stages, each of which performs a specific task that is part of an overall operation that is directed by a programmed instruction. The programmed instructions in a software application program are to be executed in sequence by the microprocessor. As an instruction enters the first stage of the pipeline, certain subtasks are accomplished. The instruction is then passed to subsequent stages for accomplishment of subsequent subtasks. Following completion of a final task, the instruction completes execution and exits the final stage of the pipeline. Execution of programmed instructions by a pipeline microprocessor is very much likened to the manufacture of items on an assembly line.
Early microprocessors were not sophisticated so as to execute multiple instructions in different pipeline stages at the same time. In these microprocessors, a given instruction would be fetched from memory and would execute until the operation prescribed by the given instruction was completed. Following this, a next instruction would executed through completion.
As microprocessor uses and enabling technologies began to proliferate during the late 1970""s, numerous approaches were proposed for dealing with the representation and computation of real number data. Whereas representation of whole number data was theretofore straightforward and unambiguous, representation of real numbers, i.e., numbers consisting of a mantissa, a decimal point, and an exponent, required standardization. Standardizing the representation and interpretation of real numbers within computer systems enabled microprocessor manufacturers to certify their microprocessors as capable of executing the more prevalently used software application programs. In the computer industry, whole numbers are referred to as integer data and real numbers are referred to as floating point data.
Because floating point data is markedly different from integer data, the early microprocessors did not even perform floating point computations on-chip. Rather, separate chips-floating point co-processors-were developed to be used in conjunction with these early microprocessors. Special programming codes were used in application programs to easily distinguish floating point instructions from integer instructions. When a floating point instruction was encountered in an instruction stream, it was immediately routed to the floating point co-processor for execution.
Today, even though advances in the art have more than enabled the incorporation of floating point co-processors into the same integrated circuits as their host microprocessors, integer instructions and floating point instructions are still processed architecturally in the same manner as before. Although floating point units reside on the same integrated circuit, the processing of floating point data is still treated as though the floating point units were separate. This is because the fundamental nature and formats of floating point data has not changed. It is still significantly different from integer data and a tremendous volume of legacy software is still in use.
One of the remnants of early designs which is still in use today is the logic used to store and access floating point data within a floating point unit. More specifically, rather than directly addressing a specific location in a floating point register file, floating point instructions specify register locations relative to a variable address called the top-of-stack register. In a stack register configuration, data is typically accessed in a last-in-first-out fashion. All new data is written to the top-of-stack register. As each new operand is placed on the top-of-stack, logic within the floating point unit itself changes a pointer to the top-of-stack so that it points to the next register. In like manner, most floating point instructions implicitly retrieve one of their operands from the top-of-stack. Within the floating point unit, as an operand is retrieved from the top-of-stack, the logic changes the address of the top-of-stack so that is points to the next previously stored data. Furthermore, the results of all floating point computations are placed on the top-of-stack. This implicit prescription of the top-of-stack register by virtually all floating point instructions was very useful in early microprocessor architectures. Today, is has proved to be very cumbersome.
Accessing data in a stack is cumbersome because sequential instructions in an application program do not necessarily use a previously computed result as an operand in a succeeding computation. And most floating point instruction sets account for this variation by providing an instruction that allows a programmer to direct the microprocessor to swap the contents of the top-of-stack with the contents of another stack register. This instruction, a floating point exchange instruction, allows a programmer to move a floating point data object from somewhere else in the floating point stack to the top-of-stack register, thus setting up the data for a following computation.
Use of the floating point exchange instruction is prolific. In fact, the present inventors have observed that a significant amount of code exists today that exhibits the following pattern of floating point operations and exchanges: OPERATION 1xe2x86x92EXHANGExe2x86x92OPERATION 2xe2x86x92EXCHANGExe2x86x92OPERATION 3xe2x86x92EXCHANGExe2x86x92etc. One skilled in the art will appreciate that every other instruction in the pattern is an instruction that does not materially contribute to the computation of a final result; the exchanges are only present to move data around in an otherwise unwieldy register file.
Several attempts have been made in more recent years to improve the execution efficiency of floating point algorithms. Two approaches which directly relate to this application attempt to absorb the time it takes to perform a floating point exchange operation into the time it takes to perform another ongoing operation. The first approach-super-scalar architecture-provides two quasi-independent execution units within a single microprocessor. Complex dispatch logic analyzes streams of instructions entering the pipeline and routes instructions, often in parallel, to the two execution units. Results from the execution units are subsequently provided to reorder/retirement logic that reassembles the results in program order so they can be written to architectural registers. Within a super-scalar microprocessor, floating point exchange instructions are dispatched to one execution unit in parallel with another floating point instruction that is dispatched to the other execution unit, thus effectively performing the exchange operation in zero clock cycles.
The second approach has complex logic at the beginning of the pipeline to initially analyze sequences of incoming macro instructions. If a floating point exchange instruction is found between two floating point computational instructions (the optimum case), then the logic manipulates the source and destination register location specifiers in each of the two surrounding instructions so that the floating point exchange operation is accomplished as a result of executing the two computational instructions with the manipulated specifiers.
In either approach described above, the amount of hardware, and corresponding complexity, cost, and power consumption, to accomplish an exchange operation is significant, thus overly complicating the design of a pipeline microprocessor.
Therefore, what is needed is an apparatus in a pipeline microprocessor that allows an exchange operation to be performed in conjunction with another operation within a single floating point.
In addition, what is needed is a microprocessor that combines a floating point exchange operation with an adjacent floating point operation for parallel execution within the same floating point unit.
Furthermore what is needed is a method for pairing a floating point exchange operation with another floating point operation so that the exchange operation is accomplished in zero effective clock cycles.
To address the above-detailed deficiencies, it is an object of the present invention to provide a pipeline microprocessor that performs a floating point exchange operation in parallel with another floating point operation within a single floating point.
Accordingly, in the attainment of the aforementioned object, it is a feature of the present invention to provide a microprocessor apparatus for performing a floating point exchange operation. The apparatus includes translation logic and floating point register logic. The translation logic receives an exchange macro instruction from a source therefrom, and provides a micro instruction extension that directs the microprocessor to perform the floating point exchange operation, where the micro instruction extension is paired with a micro instruction for parallel execution within a floating point unit. The floating point register logic is coupled to the translation logic. The floating point register logic receives the micro instruction and the micro instruction extension, and performs the floating point exchange operation in parallel with the operation directed by the micro instruction.
An advantage of the present invention is that floating point exchanges can be paired with other floating point operations within a single floating point execution unit; complex front-end logic or super-scalar logic is not required.
Another object of the present invention is to provide a microprocessor that pairs a floating point exchange operation with an adjacent floating point operation by adding an extension to a corresponding micro instruction that directs a floating point unit to perform the adjacent floating point operation during the same clock cycles within which it performs the floating point exchange operation.
In another aspect, it is a feature of the present invention to provide an apparatus in a pipeline microprocessor for executing a floating point exchange macro instruction, the floating point exchange macro instruction directing the microprocessor to exchange the contents of two floating point stack registers. The apparatus has an instruction decoder, an exchange micro instruction extension, and floating point logic. The instruction decoder receives the floating point exchange macro instruction and another macro instruction from a source therefrom, and pairs execution of the floating point exchange macro instruction with execution of the other macro instruction. The exchange micro instruction extension is provided by the instruction decoder and directs the microprocessor to perform the operation prescribed by the floating point exchange macro instruction, where the exchange micro instruction extension is paired with a micro instruction corresponding to the other macro instruction. The floating point logic is coupled to the instruction decoder. The floating point logic receives the micro instruction and the micro instruction extension, and executes the operation prescribed by the floating point exchange macro instruction in parallel with the operation prescribed by the other macro instruction.
In a further aspect, it is a feature of the present invention to provide a microprocessor for executing a floating point exchange operation in parallel with another operation within a single floating point execution unit. The microprocessor includes a translate queue, a translator, and floating point register logic. The translate queue buffers a floating point exchange macro instruction and another macro instruction for decoding. The translator is coupled to the translate queue. The translator decodes the floating point exchange macro instruction and the other macro instruction during a single clock cycle, and generates an exchange micro instruction extension corresponding to the floating point exchange macro instruction, and generates a micro instruction corresponding to the other macro instruction, and couples the exchange micro instruction extension to the micro instruction. The floating point register logic is coupled to the translator, and executes, in parallel, the floating point exchange operation and the other operation.
Another advantage of the present invention is that a less complex and less costly technique is provided for improving the execution speed of floating point software application programs.
Yet a further object of the present invention is to provide a method for pairing a floating point exchange operation with another floating point operation so that the exchange operation is accomplished in zero effective clock cycles.
In yet a further aspect, it is a feature of the present invention to provide a method for executing a floating point exchange operation in parallel with another operation, the operations being performed within a single floating point execution unit in a pipeline microprocessor. The method includes decoding, in parallel, a floating point exchange macro instruction and another macro instruction, the floating point exchange macro instruction prescribing the floating point exchange operation and the other macro instruction prescribing the other operation; adding an extension to a micro instruction that prescribes the other operation, the extension prescribing the floating point exchange operation; and providing the micro instruction with the extension to the single floating point execution unit during the same clock cycle.
Yet a further advantage of the present invention is that a method is provided for significantly improving the performance of a pipeline microprocessor without having to add a great deal of hardware to its basic design.