In a prior microprocessor that includes a floating point unit, data for floating point operations is typically stored in a physical data register file. The data register file includes a plurality of registers numbered consecutively. These registers are not addressed directly by instruction opcodes, but rather as a stack. When data is stored in such data register file, all data accesses by the floating point unit can be addressed with respect to the position of the data register in the stack relative to a given register called the stack top register. A top-of-stack (TOS) pointer is employed to point at the data register which is currently chosen as stack top. The TOS pointer is contained in a status word register of the floating point unit. All data operations are performed in connection with the TOS address. Each of the instructions to the data register file always has its first source operand implicitly in the stack top. The address to the second operand, if any, is specified as an offset or index off the stack top. In this way, the actual register address or number to the second operand has to be obtained by adding this offset or index to the register number of the register which is indicated as current stack top.
FIG. 1 illustrates a prior data register file 12 that can be addressed as a stack. In FIG. 1, data register file 12 includes eight registers R.sub.0 through R.sub.7. Registers R.sub.0 -R.sub.7 are numbered consecutively (i.e., from 000-111) and are accessed by stack addressing. A TOS (i.e., top of stack) pointer 16 of a status word register provides a three-bit address for data registers R.sub.0 -R.sub.7, indicating which is currently stack top STO. TOS pointer 16 supplies its top of stack address to address generating logic 24.
In addition to the top of stack address from TOS pointer 16, address generating logic 24 also receives stack addresses from a stack address field 18. The stack address field 18 is used to provide logical stack addresses of instruction operands and destination for an instruction. Address generating logic 24 then combines the stack addresses of an instruction with the top of stack address to generate the actual register numbers to data register file 12 via line 20. The stack addresses are offsets or indexes off the top of stack address.
The instructions to data register file 12 each typically has its first source operand implicitly stored in the stack top register. The stack address to the second operand, if any, is specified as a three bit value or index off the stack top. In this way, the actual register number of the register that stores the second operand needs to be obtained by adding the index to the register number of the register which is indicated by the top of stack address as current stack top.
TOS pointer 16 is contained in a TOS update logic 30. TOS update logic 30 updates TOS pointer 16 with a new top of stack address, whenever necessary. TOS update logic 30 is controlled by a microcontrol stack field 26. Stack field 26 contained in some microcontrol vectors provides control directives to TOS update logic 30 for updating TOS pointer 16.
In the data register file as described above, the stack top ST0 is most heavily used. This is because a single operand instruction must operate upon the stack top ST0 and replaces it with a result. A two-operand instruction always uses the stack top ST0 for one of the two operands while the other operand is accessed with an offset added to the top of stack address. The result from the two-operand instruction is then written back to either the stack top ST0 or the other register. A load from memory instruction loads an operand from memory into the stack top ST0. A store to memory instruction reads an operand from the stack top ST0. An FXCH instruction exchanges the content of the stack top register with the content stored in a second register. Therefore, the stack top ST0 is used the most frequently.
Thus, for the prior microprocessor having the prior data register file, a source operand must typically be brought to the stack top ST0 before an instruction can operate upon it. In many cases, this requires an FXCH instruction to bring the desired data to the stack top. The addition of the FXCH instruction directly impacts the efficiency and throughput of the prior microprocessor.
In addition, the execution of the FXCH instruction in the prior art as shown in FIG. 1 is not efficient. The FXCH instruction exchanges the actual data. This typically requires the data to be available before the FXCH can be executed. If the data is unavailable, the FXCH instruction is required to wait for the data to be available. Also, on the prior art, the FXCH instruction (and, in fact, all floating point instructions) is required to wait for all previous instructions to complete.
It is to be noted that the execution of the FXCH instruction in the prior microprocessor is accomplished by exchanging actual data. In this case, an inefficiency will be incurred due to the fact that the FXCH instruction is required to stall until both operands are available. This inefficiency arises regardless of whether the execution hardware is pipelined.
For example, let us examine an FADD instruction that adds the top two operands in the stack and, after the add operation, another FADD instruction needs to be executed. The instruction stream generated to accomplish this task may look like the following:
______________________________________ FADD ST0, ST1 FXCH ST0, ST2 FADD ST0, ST3 ______________________________________
in which the first FADD instruction adds the operands in the ST0 and ST1 registers together and stores the result back to the stack top ST0. The second FXCH instruction then exchanges the operand in the stack top ST0 with the operand in the ST2 register. The third FADD instruction then adds the operands in the stack top ST0 and ST3 register together and stores the result back to the ST0.
As can be seen from FIG. 2, the FADD and the FXCH instructions cannot be overlapped in execution. The FADD instruction begins its execution during clock 1 and finishes execution and returns its result to the stack top ST0 during clock 5. It is not until after clock 5 that the result of the FADD instruction is available in the stack top ST0. Since the stack top ST0 does not have the result until after clock 5, the FXCH exchange instruction cannot be executed until the stack top ST0 receives a value. In this case, the FXCH instruction cannot begin execution at clock 2 as is necessary in a prior pipelined microprocessor. The FXCH instruction, in this case, must wait until clock 6 at which time the stack top ST0 has the result of the preceding FADD instruction. Stalling the FXCH instruction dramatically decreases instruction throughput of the prior pipelined microprocessor. As a matter of fact, the advantages of pipelining the instruction executions are lost in the prior pipelined microprocessor in situations such as described above where a subsequent instruction cannot begin execution until after a prior instruction has completed execution and has released the top of stack register. Note that the throughput in this case is adversely affected not by true data dependencies, but rather by artificial dependencies created by re-use of the same stack top register.