1. Field of the Invention
This invention relates in general to the field of instruction execution in computers, and more particularly to an apparatus in a pipeline microprocessor for providing speculative address operands to address-dependent micro instructions.
2. Description of the Related Art
A pipeline microprocessor has an instruction path, or pipeline, that sequentially executes instructions from an application program in synchronization with a pipeline clock signal. The instruction pipeline is divided into stages and each of the stages perform specific sub-operations that together accomplish a higher level operation prescribed by a program instruction. The program instructions in the application program are executed in sequence by the microprocessor. As an instruction enters the first stage of the pipeline, certain sub-operations are accomplished. The instruction is then passed to subsequent stages in the pipeline where subsequent sub-operations are accomplished. After performing the last set of sub-operations in the last stage of the pipeline, the instruction completes execution and exits the pipeline. Execution of instructions by a pipeline microprocessor is very similar to the manufacture of items on an assembly line.
Early pipeline microprocessors were not sophisticated enough to allow the execution of multiple instructions in different pipeline stages at the same time; that is, they executed one instruction at a time. More specifically, a given instruction would be fetched from memory and would proceed through all of the pipeline stages until it completed execution. Following this, a next instruction would be fetched and proceed through the pipeline stages through completion. And although this approach is not very efficient in terms of instruction throughput, since early pipeline microprocessors had only a few pipeline stages, the inefficient utilization of stage resources was not deemed to be a significant performance limitation.
However, as microprocessors began to proliferate, more stringent requirements were imposed on microprocessor designers, particularly with respect to instruction throughput. And the obvious approach for increasing throughput was to provide for the execution of multiple instructions within the pipeline. Clearly this improvement increased performance because resources within each pipeline stage were more efficiently used. But with this architectural change came a problem: What if one instruction executing in an early pipeline stage required an operand that was yet to be generated by a preceding instruction executing in a subsequent pipeline stage? This issue is in fact frequently confronted in the art because one of the foremost characteristics of application programs is that instructions in close proximity to one another tend to perform tasks using the same operand. For instance, a typical control algorithm within an application program computes a true signal value by adding a small number to a currently generated signal value and then the sum is compared to a reference signal. The structure of the control algorithm is to add a first operand to a second operand to produce a result. Then the result is tested to see if the computed value is within tolerance. If not, then the first operand is added again to the computed result to obtain a second result. Then the second result is tested. And so on. Even in this simple algorithm it is evident that every other instruction utilizes the last computed result.
When a given instruction executing in one stage of the pipeline requires an operand that is yet to be generated by a preceding instruction that is proceeding through a subsequent stage of the pipeline, the given instruction is referred to as a dependent instruction. This is because the operand required by the dependent instruction depends upon generation of a result by the preceding instruction.
To deal with dependent instructions, microprocessor designers added interlock logic to existing pipeline designs. The interlock logic spans the stages of a microprocessor where dependencies occur. During execution of a sequence of instructions by the microprocessor, non-dependent instructions are successively advanced through the pipeline stages in synchronization with the clock. However, when a dependent instruction is detected, the interlock logic stalls execution of the dependent instruction by inserting slips into the pipeline until the operand required by the dependent instruction is generated by a preceding instruction. The number of slips that are inserted into the pipeline directly influence the amount of delay that is experienced by an application program executing on the microprocessor. Two factors drive the number of slips that are inserted: 1) the separation in the instruction pipeline between the preceding instruction and the dependent instruction; and 2) the number of clock cycles that are needed by the preceding instruction to actually generate the operand. This application focuses on problems associated with the separation between the two instructions in the pipeline.
In general, program instructions use operands for two distinct types of computations in a present day microprocessor: address computations and result computations. Address computations are performed early in the pipeline by address stage logic to compute addresses of memory operands that are to be loaded from memory or stored to memory. Result computations are performed in a later execution stage of the microprocessor to carry out arithmetic, logical, or other operations prescribed by program instructions.
A particular class of dependencies called address dependency occurs when a preceding instruction has not yet generated a result of a result computation that is presently required as an operand for a dependent-instruction for use in an address computation. The instruction prescribing the address computation is called an address-dependent instruction. And because the address-dependent instruction requires the result that has not yet been generated by the preceding instruction, the interlock logic prevents the address-dependent instruction from proceeding in the pipeline until the preceding instruction generates and provides the result.
If there are only two pipeline stages separating the address computation logic from the result computation logic, then at least two slips must be inserted into the pipeline to delay the address-dependent instruction until the preceding instruction provides the result. But if there are 10 stages separating the address computation logic from the result computation logic, then at least 10 slips are required. Furthermore, microprocessor designers are progressively increasing the number of stages in microprocessor pipelines to provide overall throughput improvements. Consequently, these improvements negatively impact address-dependency delays because address-dependent instructions must be stalled for a greater number of clock cycles.
Therefore, what is needed is an apparatus in a pipeline microprocessor that allows address-dependent instructions to proceed without experiencing stalls.
In addition, what is needed is an interim result computation apparatus that can provide speculative address operands to address-dependent instructions prior to when a preceding instruction generates a final result.
Furthermore, what is needed is an apparatus for generating and temporarily storing intermediate results, and for providing these results to address-dependent instructions, thus allowing the address-dependent instructions to proceed without being delayed.
To address the above-detailed deficiencies, it is an object of the present invention to provide a pipeline microprocessor apparatus for speculatively permitting address-dependent instructions to proceed without experiencing delays.
Accordingly, in the attainment of the aforementioned object, it is a feature of the present invention to provide an apparatus in a pipeline microprocessor for providing a speculative address operand associated with a result of an arithmetic operation, the arithmetic operation being prescribed by a preceding micro instruction. The apparatus includes speculative operand calculation logic and an update forwarding cache. The speculative operand calculation logic is within an address stage of the pipeline microprocessor and performs the arithmetic operation to generate the speculative address operand prior to when execute logic executes the preceding micro instruction to generate the result. The speculative address operand is obtained from the result that is to be generated when the execute logic executes the preceding micro instruction, where the result has not yet been generated by the execute logic and written to a resister file for access by following micro instructions. The result is required by an address-dependent micro instruction within the address stage for computation of a memory address. The speculative operand calculation logic has addition logic, an arithmetic opcode decoder, and subtraction logic. The addition logic sums a first source operand with a second source operand, where the source operands are prescribed by the preceding micro instruction. The arithmetic opcode decoder directs the addition logic to sum the source operands if the arithmetic operation prescribed by the preceding micro instruction is an addition operation. The subtraction logic is coupled to the arithmetic opcode decoder and subtracts the second source operand from said first source operand. If the arithmetic operation is a subtraction operation, then the arithmetic opcode decoder directs the subtraction logic to subtract the second source operand from the first source operand. The update forwarding cache is coupled to the speculative operand calculation logic. The update forwarding cache temporarily stores the speculative address operand where the address-dependent micro instruction can retrieve the speculative address operand, thereby permitting the address-dependent micro instruction to proceed without incurring delay. The speculative address operand is provided by the update forwarding cache to the address-dependent micro instruction prior to when the address-dependent micro instruction enters the address stage, thereby allowing the address-dependent micro instruction to generate the memory address without incurring said delay, and wherein said update forwarding cache comprises a plurality of cache buffers, each of the plurality of cache buffers corresponding to each of a plurality of speculative operands.
An advantage of the present invention is that application programs are not delayed when address dependencies are associated with arithmetic results. Another object of the present invention is to provide an apparatus in a pipeline microprocessor for computing interim results that can provide speculative address operands to address-dependent instructions prior to when a preceding instruction generates a final result.
In another aspect, it is a feature of the present invention to provide a speculative operand apparatus in a pipeline microprocessor. The speculative operand apparatus has address stage logic, a speculative operand cache, and speculative operand configuration logic. The address stare logic generates a memory address prescribed by an address-dependent micro instruction. The address stage logic includes a speculative address operand calculator, a speculative operand cache, and speculative operand configuration logic. The speculative address operand calculator generates a first interim result by performing an arithmetic operation prescribed by a preceding micro instruction. The preceding micro instruction corresponds to one of the following x86 macro instructions: ADD, MOV, INC, SUB, or DEC. The arithmetic operation is performed prior to generation of a final result by the preceding micro instruction, where the final result is generated when the preceding micro instruction is executed by execute logic within the pipeline microprocessor. The final result is stored in a register for access by following micro instructions. The speculative address operand calculator includes an adder, arithmetic opcode decoding logic, and a subtractor. The adder sums a first source operand with a second source operand, where the source operands are prescribed by the preceding micro instruction. The arithmetic opcode decoding logic directs the adder to sum the source operands if the arithmetic operation is an addition operation. The subtractor is coupled to the arithmetic opcode decoding logic. The subtractor subtracts the second source operand from the first source operand. If the arithmetic operation is a subtraction operation, then the arithmetic opcode decoding logic directs the subtractor to subtract the second source operand from the first source operand. The speculative operand cache is coupled to the address stage logic. The speculative operand cache temporarily stores the first interim result, wherein the speculative operand cache comprises a plurality of cache buffers, each of the plurality of cache buffers corresponding to each of a plurality of interim results. The speculative operand configuration logic is coupled to the speculative operand cache. The speculative operand configuration logic accesses the first interim result to configure a speculative address operand corresponding to contents of the register prescribed by the address-dependent micro instruction, thereby permitting the memory address to be generated in lieu of a stall.
Another advantage of the present invention is that the present invention provides an apparatus to improve the performance of application programs that is insensitive to the number of stages separating address-dependent instructions from preceding instructions.
A further object of the invention is to provide a microprocessor apparatus that generates and temporarily stores interim results, whereby these interim results can be accessed by address-dependent instructions and used as speculative address operands.
In a further aspect, it is a feature of the present invention to provide a microprocessor apparatus for providing a speculative operand to an address-dependent micro instruction, the speculative operand corresponding to a result of a preceding arithmetic micro instruction, where the result is yet to be generated by execute stage logic in the microprocessor. The apparatus includes an opcode decoder, intermediate result calculation logic, operand cache entries, and speculative operand configuration logic. The opcode decoder evaluates an opcode of the preceding arithmetic micro instruction.
The intermediate result calculation logic is coupled to the opcode decoder. The intermediate result calculation logic generates intermediate results corresponding to arithmetic micro instructions. The intermediate result calculation logic has addition logic and subtraction logic. The addition logic generates additive intermediate results. The subtraction logic generates subtractive intermediate results. The operand cache entries are coupled to the intermediate result calculation logic. The operand cache entries temporarily store the additive intermediate results and the subtractive intermediate results. The speculative operand configuration logic is coupled to the operand cache entries. The speculative operand configuration logic selects words from selected operand cache entries to configure the speculative operand.
A further advantage of the present invention is that address generation interlock stalls are not required when an address-dependent instruction requires the result of an arithmetic computation prescribed by a preceding instruction.