1. Field of the Invention
The present invention relates to pipelining systems in microprocessors. More particularly, the present invention relates to execution of microprocessor instructions using two or more pipelines.
2. Description of the Prior Art
Most conventional microprocessors have a "pipelined" architecture to increase the processing speed of microprocessor instructions. FIG. 1 illustrates a typical conventional pipeline for processing typical instructions. The processing of such typical instructions consists of four stages: instruction fetch (referred to as F), instruction decode (referred to as D), operand address computation (referred to as A), and execution/operand fetch/operand store (referred to as X). Typically, it takes one microprocessor clock cycle for an instruction to be processed through one stage. Older microprocessors, which did not use a pipelined architecture, processed one instruction through all the four stages including the final stage of execution before any processing of a new instruction could begin. However, the more recent microprocessors, using pipelined architectures allow parallel processing of two or more instructions. This parallel processing, in the more recent microprocessors has been extended even to instructions which have a "dependency" on a previous instruction. The term "dependency" describes a condition where one instruction must wait for the result of the execution of a previous instruction. An example of such a dependency is when a first instruction I1 is an instruction to increment a value stored in a register R1, and a second instruction I2 is also an instruction to increment the value stored in the same register R1.
First, referring to cases where there is no dependency between I1 and I2, and using the typical four stages discussed above, two typical instructions I1 and I2 are processed through a pipeline as shown in FIG. 2a: During the clock cycle in which I1 is being processed by the instruction decode stage, I2 is being processed by the instruction fetch step. Likewise, when I1 is going through the address computation stage, I2 is going through the instruction decode step. Finally, when I1 is going through the final stage of execution, I2 is going through the address computation stage. Thus, using the conventional pipeline architecture and for two typical, consecutive and non-dependent instructions, I2 lags I1 by one clock cycle.
Even for cases where there is a dependency between two or more consecutive instructions, for example when both I1 and I2 are instructions to increment R1, the more recent microprocessor pipelines accommodate processing such instructions with only one clock cycle time difference between the completion of the consecutive instructions. Two examples of dependent instructions are shown in FIGS. 2b and 2c. FIG. 2b shows that according to the more recent microprocessor pipelines two consecutive INCREMENT (referred to as INC) instructions can be accomplished with only one clock cycle time difference. FIG. 2c shows the one clock cycle time difference for another set of consecutive and dependent instructions, namely two "PUSH" instructions. A PUSH instruction essentially stores a value into a memory location addressed by a register usually called a Stack Pointer Register. Thus, when I1 and I2 are both PUSH instructions, I2 must wait for an updated value of the Stack Pointer Register to write into a new memory location addressed thereby. As such, two PUSH instructions are considered to be dependent instructions. Further, in the more recent microprocessors special hardware allows an "advanced stack pointer" to be updated with a new stack pointer value at the end of address computation of instruction I1. This updated value of the "advanced stack pointer" can be used in the next clock cycle for the address computation of instruction I2. Thus, as shown in FIGS. 2b and 2c, the more recent pipelined microprocessors can complete even two data dependent instructions (like two INCREMENT instructions) or two stack instructions (like two PUSH instructions) with a time difference of only one clock cycle.
A still further improvement in the speed of pipelined microprocessors of the prior art relates to the concept of "superscalers." FIG. 3a depicts a general simplified superscaler of the prior art. This figure shows two pipeline branches. Each pipeline branch has a fetch stage, a decode stage, an address computation stage, and finally an execution stage. It will be recalled that those prior art microprocessors which lacked a superscaler pipeline architecture were unable to complete any two instructions simultaneously, regardless of whether one instruction was dependent on a previous instruction: Without the superscaler architecture, even when I1 and I2 were completely independent, the prior art required a time lag of at least one clock cycle between the completion of I2 and the completion of I1. However, using the superscaler pipeline architecture a microprocessor can execute two independent and consecutive instructions simultaneously (i.e. in the same clock cycle) as illustrated in FIG. 3b. Nevertheless, even in the superscaler pipeline architecture two consecutive but dependent instructions cannot be completed simultaneously. In that regard FIG. 3c uses two PUSH instructions as examples of two dependent instructions. As FIG. 3c illustrates, the prior art superscalers fail to extend any advantage when two consecutive instructions are dependent, since the completion of the second instruction lags the completion of the first instruction by one clock cycle.
Thus, the purpose of the present invention is to design a microprocessor pipelining system such that any two consecutive instructions, even if the two instructions are dependent, may be simultaneously completed. Furthermore, the present invention also allows the simultaneous completion of even more than two dependent instructions.