1. Field of the Invention
This invention relates in general to the field of instruction execution in computers, and more particularly to an apparatus in a pipeline microprocessor for providing address operands corresponding to recently executed micro instructions to address-dependent micro instructions.
2. Description of the Related Art
A microprocessor has an instruction pipeline that sequentially executes instructions from an application program in synchronization with a microprocessor clock. The instruction pipeline is divided into stages, each of which performs a specific task that is part of an overall operation that is directed by a programmed instruction. The programmed instructions in a software application program are executed in sequence by the microprocessor. As an instruction enters the first stage of the pipeline, certain tasks are accomplished. The instruction is then passed to subsequent stages for accomplishment of subsequent tasks. Following completion of a final task, the instruction completes execution and exits the pipeline. Execution of programmed instructions by a pipeline microprocessor is very much analogous to the manufacture of items on an assembly line.
Early pipeline microprocessors were not sophisticated so as to execute multiple instructions in different pipeline stages at the same time. Accordingly, a given instruction would be fetched from memory and would proceed through the pipeline stages until it completed execution. Following this, a next instruction would proceed through the pipeline stages through completion. And since early pipeline microprocessors had only a few pipeline stages, the inefficient utilization of stage resources was not deemed to be a significant encumbrance.
As uses for microprocessors began to proliferate however, more stringent requirements were imposed on microprocessor designers, particularly with respect to speed. And the obvious approach for increasing processing speed was to allow multiple instructions to simultaneously proceed down the pipeline at the same time. Clearly this improvement increased instruction throughput because resources within each pipeline stage were more efficiently used. But with this change came a problem: What if one instruction executing in an early pipeline stage required an operand that was yet to be provided by another instruction executing in a later pipeline stage? This predicament is in fact common to software programs; instructions that are close in proximity tend to perform tasks using the same operand. For example, a control algorithm may compute a true error signal value by adding a small number to a current error signal value and then comparing this value to some other signal that is input to the microprocessor. The structure of the algorithm is to add a first operand to a second operand to produce a result. The result is then tested to see if the computed value is tolerable. If not, then the first operand is added to the computed result to obtain a second result. The second result is tested. And so on. Even in this simple algorithm it is evident that every other instruction utilizes the last computed result. When a given instruction executing in one stage of the pipeline requires an operand that is to be generated or that is modified by another instruction executing in a subsequent stage of the pipeline, the given instruction is referred to as a dependent instruction. This is because the operand required by the dependent instruction depends upon generation of the operand by the other instruction.
Of interest for this application is a particular class of operand dependencies known as address dependency. More specifically, most present day microprocessors provide instructions for expediting storage of information in a memory stack. These stack instructions implicitly prescribe a top of stack memory location where new data is to be stored or from where most recently written data can be retrieved. Each time new data is written to the stack, an address operand pointing to the top of the stack must be modified to indicate a new top of stack. Otherwise, when subsequent data is written, the new data would be overwritten. Similarly, the address operand must be modified to account for retrieval of data from the stack. Stack instructions are very powerful for use by application programmers because they merely need specify what data is to be written, or pushed, onto the stack, or what is to be retrieved, or popped, off of the stack. Manipulation of the stack pointer address operand is performed automatically by logic within the microprocessor. Even in the simple case of two successive pop instructions, one skilled in the art will perceive that an address operand dependency case exists. The stack pointer is not available for use by the second pop instruction until it is written back into its address operand register by the first pop instruction.
To deal with address-dependent instructions, microprocessor designers added interlock logic to existing pipeline designs. The interlock logic spanned the stages of a microprocessor where the address dependencies could occur. During normal operation, non-dependent instructions were successively advanced through the pipeline stages in synchronization with the clock. When the interlock logic encountered an address-dependent instruction, it stalled execution of the address-dependent instruction by inserting slips into the pipeline until the address operand required by the address-dependent instruction was made available.
In spite of the advances provided through interlock logic, demands for faster throughput continued to press microprocessor designers. Consequently, an alternative to interlock logic was developed that allowed address-dependent instructions to proceed down the pipeline without incurring slips. This alternative is known as an address operand distribution bus. In essence, the address operand distribution bus originates at the stage of the pipeline in which address operands are modified. When an address operand is modified, it is copied to the bus and then routed to all of the earlier stages that are affected by address-dependent instructions. If an address-dependent instruction is present within any of the earlier stages, then logic within that stage performs all of the operations necessary to properly configure the required address operand from the provided intermediate address operand. The address distribution approach can be thought of as a one-to-many distribution scheme because one address operand can be distributed to several address-dependent instructions at the same time.
The address operand distribution scheme has prevailed as the principal technique for expediting the execution of address-dependent instructions, that is, until more recent times when demands for further throughput increases have compelled microprocessor designers to substantially alter the design of stages within the pipeline. These alterations to the pipeline can be comprehended through use of an assembly line analogy. Suppose an assembly line is set up with three stages, where each of the three stages is required to insert two screws in a product that flows down the line, for a total of six screws. Further suppose that the time required to insert a screw is one minute. To send a product through the assembly line, then, requires six minutes. If multiple products are sent down the line, then it follows that one product rolls off the line every two minutes.
A simple enhancement to the line will double the production throughput: Reconfigure the line into six stages, where each stage is required to insert only one screw. While with this architectural change it still takes six minutes to pass a product through the line, the improvement now is that one product rolls off of the line every minute.
The speed and throughput are doubled by doubling the number of stages and halving the operations performed in each stage.
To improve the throughput of current microprocessors, designers are taking the very same approach: pipeline stages are being added and the functional requirements for each stage are being decreased. Thus, faster clock speeds can be applied and instruction throughput is increased.
But increasing the number of pipeline stages has highlighted a deficiency with the result distribution technique for dealing with dependent instructions. Whereas early microprocessor pipelines consisted of only a few stages, the attendant logic required to implement an address operand distribution bus was not much of an issue. But for every added stage in the execution pipeline, an additional set of address operand distribution logic must be provided.
In other words, the logic required to implement an address operand distribution bus is directly proportional to the number of stages to which the address operand is to be distributed. Substantially more stages also require that logic elements for driving the address operand signals must be more powerful. Moreover, the timing to distribute address operands to multiple pipeline stages is not only a function of the number of stages, but is also a based upon the location of the stage that is physically farthest from the origination stage. Hence, circuit complexity, power, and timing problems arise when an address operand distribution scheme is applied to pipeline architectures that have many pipeline stages.
Therefore, what is needed is an apparatus in a pipeline microprocessor that allows address-dependent instructions to execute without delay, but which is not adversely affected by the number of stages in the microprocessor pipeline.
In addition, what is needed is a mechanism to provide address-dependent micro instructions with generated address operands that does not require additional sets of logic, or exhibit timing problems when employed in a pipeline microprocessor having multiple pipeline stages.
Furthermore, what is needed is an apparatus in a pipeline microprocessor for temporarily storing several intermediate address operands that can be accessed in a single pipeline stage by an address-dependent micro instruction.
To address the above-detailed deficiencies, it is an object of the present invention to provide a mechanism for expeditiously executing address-dependent instructions that can adapt, without adverse hardware, power consumption, or timing consequences, to advanced pipeline architectures having more pipeline stages.
Accordingly, in the attainment of the aforementioned object, it is a feature of the present invention to provide a microprocessor apparatus for providing an address operand to an address-dependent micro instruction. The microprocessor apparatus includes an update forwarding cache, address update logic, and address configuration logic. The update forwarding cache stores intermediate address operands. The address update logic is coupled to the update forwarding cache and enters the intermediate address operands into the update forwarding cache. The address configuration logic is coupled to the update forwarding cache and accesses the intermediate address operands to provide the address operand required by the address-dependent micro instruction.
An advantage of the present invention is that only minimal changes are required to provide address operands to address-dependent instructions when pipeline stages are added.
Another object of the present invention is to provide an apparatus for executing address-dependent instructions without delay that is less complex than has heretofore been provided.
In another aspect, it is a feature of the present invention to provide an intermediate address operand cache for storing intermediate address operands calculated by preceding micro instructions, where the intermediate address operands are entered into the intermediate address operand cache prior to being entered into a register file. The intermediate address operand cache has address operand buffers, tag buffers, valid word indicators, a word selector, and address operand configuration logic. The address operand buffers store the intermediate address operands. The tag buffers are coupled to the address operand buffers. Each of the tag buffers designates a corresponding register in the register file within which a corresponding intermediate address operand is to be entered. The valid word indicators are coupled to the address operand buffers. Each of the valid word indicators indicates which words in a corresponding intermediate address operand buffer are valid upon entry of the corresponding intermediate address operand. The word selector is coupled to the tag buffers and the valid word indicators. The word selector determines selected word locations within selected address operand buffers that are used to configure an address operand for an address-dependent micro instruction. The address operand configuration logic is coupled to the word selector and the address operand buffers. The address operand configuration logic retrieves words from the selected word locations within the selected address operand buffers to configure the address operand.
Another advantage of the present invention is that a mechanism for expediting the execution of address-dependent instructions is provided that is well-suited for multi-stage pipeline designs.
A further object of the invention is to provide a mechanism for accessing address operands in a single pipeline stage whereby an address-dependent instruction can execute without incurring slips.
In a further aspect, it is a feature of the present invention to provide an apparatus in a pipeline microprocessor for providing an address operand to an address-dependent micro instruction. The apparatus includes address calculation logic, address operand cache entries, address update logic, and address operand configuration logic. The address calculation logic generates intermediate address operands corresponding to micro instructions that precede the address-dependent micro instruction. The address operand cache entries store the intermediate address operands. The address update logic is coupled to the address calculation logic and the address operand cache entries. The address update logic enters a specific intermediate address operand into a specific address operand cache entry following calculation of the specific intermediate address operand by the address logic and prior to the specific intermediate address operand being written to a specific address operand register in a register file. The address operand configuration logic is coupled to the address operand cache entries and selects words from selected address operand cache entries to configure the address operand.
A further advantage of the present invention is that provision of address operands to address-dependent instructions can be accomplished in more advanced pipeline microprocessors without incurring problems related to circuit complexity, routing, power, or timing.