1. Field of the Invention
The present invention relates generally to the field of computer hardware, and more specifically, to a method and an apparatus that lower the number of stall latches in pipelines.
2. Description of the Related Art
Modern processors frequently execute several operations and/or instructions in parallel, because parallel execution can increase operating speed. A structure executing several instructions in parallel employs devices to isolate data associated with one instruction from mixing with data associated with another instruction. One such structure is a pipeline. A pipeline uses a series of stages to execute each instruction. A pipeline processes different instructions in parallel by keeping the data from each instruction in a different stage at any one time.
FIG. 1(A) illustrates a generic four stage pipeline 8. Data or instructions enter at an input 10. The input 10 is the first stage of the pipeline 8. After passing through four execution stages 12, 14, 16, 18, the results are transmitted to an output 20. The output 20 is the last stage of the pipeline 8. In the pipeline 8, isolation devices 22, 24, 26, 28, 30 connect the stages 10, 12, 14, 16, 18, 20. In response to receiving a first timing pulse delivered by a timer 32, the isolation devices 22, 24, 26, 28, 30 store all data from the preceding stage 10, 12, 14, 16, 18. In response to receiving a second timing pulse delivered the timer 32, the isolation devices 22, 24, 26, 28, 30 transmit all stored data to the following stage 12, 14, 16, 18, 20. In some pipelines, the first and second timing pulses are the same pulse. The steps of storing data are triggered simultaneously in each isolation device 22, 24, 26, 28, 30 by each first timing pulse. The steps of transmitting stored data are triggered simultaneously in each isolation device 22, 24, 26, 28, 30 by each second timing pulse. The simultaneous triggering of the storage and transmission of data ensures that the results for different instructions do not mix in the pipeline 8.
In the prior art, the isolation devices 22, 24, 26, 28, 30 are typically banks of latches, D flip-flops (DFF) or more complicated sequential logic units. Since latches and flip-flops store and transmit one bit of information at a time, the number of latches or flip-flops employed by each isolation device 22, 24, 26, 28, 30 is a function of the quantity of information that is stored and transmitted between the particular stages 10, 12, 14, 16, 18, 20 of the pipeline 8. In some pipelines, the amount of data is substantially larger at early stages and smaller at later stages.
FIG. 1(B) is a graph illustrating the amount of data at each stage of a floating-point multiplication pipeline (FMP) with four execution stages. The solid curve 34 shows a situation where much more data is transmitted between early pipeline stages, e.g., between stages 12, 14, 16 of FIG. 1(A), than between later stages, e.g., between stages 16, 18, 20 of FIG. 1(A). In a FMP, there is much more data in early stages, because many partial products are calculated during early stages. The partial products are summed in later stages, resulting in much less data. Other types of pipelines also frequently transmit much more data between particular pipeline stages.
Referring to FIGS. 1(A),(B), data storage and transmission involves many more latches or flip-flops (not shown) in the isolation devices 24, 26 located between the earlier stages 12, 14, 16. The large number of latches or flip-flops used by the isolation devices 24, 26 may cause undesirable side effects. First, the large number of latches or flip-flops may substantially increase the area of a chip (not shown) that is occupied by the pipeline 8. Second, since a large number of latches or flip-flops are connected to the timer 32, the timing pulses to these latches or flip-flops may be substantially weakened and require the introduction of boosters (not shown). The boosters will take up even more precious area on the chip. These undesirable side effects are significantly more serious in pipelines employing latches or flip-flops that can be stalled, generically referred to as stall latches.
FIGS. 2(A)-(C) form a chart illustrating the operation of a stall in the four stage pipeline 8 of FIG. 1(A). At an initial time, the pipeline 8 contains data from different instructions A, B, C, D and E stored in the isolation devices 22, 24, 26, 28, 30 between the various stages 10, 12, 14, 16, 18, 20 of the pipeline 8. At the initial time, the stall control 38 sends a stall signal to the isolation devices 22, 24 located between the first three stages 10, 12, 14. The length of the stall and the latest isolation device 24 stalled are generally dependent on the reason for the stall and may vary in different pipelines and, in the same pipeline, may vary at different times. For the illustrative example of FIGS. 2(A)-(C) the stall is assumed to last for three timing pulses and to effect all isolation devices 22, 24 earlier in the pipeline than the second execution stage 14. The stall control 38 sends the stall signal during three timing pulses. The first two isolation devices 22, 24 are stalled from storing data from the preceding pipeline stages 10, 12, and from transmitting stored data to the following pipeline stages 12, 14 during the three timing pulses that the stall signal is received. In some pipelines, the signal from the stall control 38 stalls isolation devices 22, 24 from storing data from the preceding pipeline stages 10, 12 without stalling the transmission of already stored data. At the initial time, the stall signal freezes further progress of the execution of instructions A and B by the pipeline 8.
The reasons for a stall may depend on circumstances external to the pipeline 8. For example, an external device (not shown) may have determined that a register (not shown) will not be available to store the result from instruction B for a certain number of timing cycles. The external device requests the stall to ensure that the result from instruction B is not transmitted to the output 20 before the register is available. In a second example, an external control (not shown) of the pipeline 8 requests the stall, because a later pipeline stage, for example the fourth execution stage 18, will not be able to execute instruction B without supplemental data from another source. The supplemental data is delayed, and the stall gives time for that data to be provided.
FIG. 2(B) illustrates the same pipeline two timing pulses later. Instructions C, D, and E, which were initially in later isolation devices 26, 28, 30 have continued their progress through the pipeline 8. The isolation devices 22, 24 continue to be stalled by the signal that the stall control 38 continues to send. Instructions A and B are still stored in the isolation devices 22, 24 between the first three stages 10, 12, 14 of the pipeline 8. The stall results in a bubble of nonsense execution data in the third and fourth isolation devices 26, 28.
FIG. 2(C) illustrates the condition of the pipeline six timing pulses after the initiation of the stall. The stall control 38 stopped sending the stall signal three timing pulses earlier, and the first two isolation devices 22, 24 thereafter started transmitting stored data and storing data again. The instructions A, B have progressed through the third and fourth stages 16, 18 of the pipeline 8 respectively during those three unstalled timing pulses. The bubble of nonsense execution data is discarded at the output 20. Starting with the arrival of instruction B at the next timing pulse, the results from the pipeline 8 will be treated normally at the output 20.
FIG. 3 illustrates a typical stall latch 40 that may be used in the banks of latches or flip-flops of the first two isolation devices 22, 24 of FIGS. 1(A) and 2(A)-(C). The stall latch 40 has a D flip-flop (DFF) 46. A feedback loop 48 connects the output 50 of the DFF 46 to one input 52 of a 2 to 1 multiplexer (MUX) 54. The second input 56 of the MUX 54 is an input for logical data from an earlier stage 42 of the pipeline. The output 50 of the DFF 46 forms an input for logical data to a following stage 44 of the pipeline. An input 58 of the DFF 46 is connected to an output 60 of the MUX 54. The stall control 38 of FIGS. 1 and 2 selects one of the two inputs 52, 56 of the MUX 54. At each timing pulse delivered by the timer 32, the DFF 46 stores data from the output 60 of the MUX 54. During the same timing pulse, the DFF 46 transmits its stored data to the following stage 44 and to the input 52 of the MUX 54 via the feedback loop 48.
In the transmit configuration, the stall control 38 output is at a logical low selecting the input 56 of the MUX 54. The DFF 46 transmits stored data to the following stage 44, and provided that the MUX 54 has finished setting up before the timing pulse is received by the DFF 46, the DFF 46 stores data from the earlier stage 42 appearing at the output 60 of the MUX 54. In the stall configuration, the stall control 38 sends a logical high signal selecting the input 52 of the MUX 54. At each timing pulse, data stored in the DFF 46 is transmitted to the following stage 44 and to the input 52 of the MUX 54 by the feedback loop 48. The DFF 46 also stores data from the output 60 of the MUX 54 at each timing pulse, but the data stored is the same data that was stored in the DFF 46 before the timing pulse. In the stall configuration, the data stored by the latch 40 continues to be transmitted to the following stage with each timing pulse, but same data is restored in the DFF 46. Thus, the stall latch 40 does not store data from the earlier stage 42 while in the stall configuration.
As illustrated in FIG. 3, stall latches 40 typically have more elements than regular latches or flip-flops. Stall latches 40 slow pipeline operation, due to setup times associated with the additional elements. Stall latches 40 use more area on a chip surface (not shown) than regular latches or flip-flops, due to the additional elements. In pipelines that use a large number of latches in particular isolation devices, for example the first two isolation devices 22, 24 of the pipeline illustrated by FIGS. 1 (A)-(B), a large area of the chip surface is used when stall latches 40 are employed in those particular isolation devices 22, 24. Besides the increased chip space and time delays, connections from the stall control 38 become problematic when the stall latches 40 are used in those particular isolation devices 22, 24. In those particular isolation devices 22, 24, capacitance's between the many adjacent control lines from the stall control 38 ordinarily weaken signals. In such cases, voltage booster devices (not shown in FIG. 3) are generally employed to boost signals to the target stall latches 40. The voltage boosters take up even more costly space on the chip and use additional setup time diminishing the time window for stall initiation.
The present invention is directed to overcoming, or at least reducing the effects of, one or more of the problems set forth above.