This invention relates to computer systems and methods of operating computer systems, and more particularly to a method of reducing branch delay in a computer system of the pipelined type.
Computer systems, as in many systems designed for peak operating efficiency and speed, are often pipelined. The method of pipelining a computer system is analogous to that of a manufacturing assembly line. A specific task is broken into multiple smaller tasks that are performed sequentially by a series of job stations, or pipeline stages. Peak efficiency is achieved when all stages are kept busy doing useful work toward a final product. This implies that all tasks require approximately the same amount of time to complete. In the case of a computer system, the work consists of tasks required to process an instruction in the correct instruction flow for the program being processed.
Computer programs generally contain branch instructions, which can alter the flow of instructions necessary to complete a program. It is the combination of fast arithmetic operations and flexibility provided by branch instructions that produce the power of computer system. There are two forms of branch instruction, unconditional and conditional. Unconditional branches are used when a program is required to change instruction flow regardless of any data condition. Conditional branches require examination of some data before resolving the correct program flow. In either case, when a branch changes the flow of instructions from the sequential path, the branch is said to be taken. Non-taken branches are caused by conditional branches that have failed the branch condition test.
Branch instructions that are taken, or predicted to be taken, create unique problems when applying the principles of pipelining to a computer system. When a computer system encounters a branch instruction, it is typically presented with two possible paths of program execution. Executing both paths is currently prohibitively expensive, but waiting until the branch condition is resolved can idle pipeline stages ahead, which require new input to be kept busy. For this reason, many computer systems implement some form of branch prediction. If the branch is predicted incorrectly, then all instructions fetched down the incorrect instruction path must be canceled and the pipeline restarted down the correct path. If the branch is predicted correctly, then idle time has been minimized and the pipeline can proceed. It is important to note that if the branch instruction itself is used to determine the correct path of program execution, this idle time cannot be eliminated but only minimized.
One way of avoiding the waste of this idle time is a technique referred to as delayed branching and implemented in a RISC processor made by MIPS Computer Systems, Inc., and sold under the part number R2000. This processor is described by Lane in MIPS R2000 RISC ARCHITECTURE, published by Printice-Hall, 1987. The delayed branching technique will be explained as follows.
Assume that the first pipeline unit, or stage, accomplishes the task of fetching instructions. The second pipeline unit serves to decode these instructions and direct the fetch unit with an address of the next instruction to fetch. When the pipeline is processing non-branch instructions, it simply fetches at sequential addresses to supply the next needed instruction. Consider the case of a taken branch instruction; since the branch instruction is needed to determine the address of the next instruction to fetch, there is one time-slot while the decode unit is processing the branch that it cannot direct the fetch unit where to locate the next instruction. Since the instruction fetch unit cannot be redirected until the branch passes through the decode unit, the branch instruction appears to take two time-slots to process. The second time-slot does not have the address of the branch target yet and therefore cannot redirect the instruction flow until the third time-slot. Some computer systems (such as the MIPS R2000, as well as SPARC and HP) attempt to provide useful work in this time-slot by defining the instruction immediately following a branch as always valid. This is the method referred to as delayed-branching. In this way, the time required to redirect the instruction flow is utilized fetching this extra instruction. The challenge of this method comes in finding an instruction that will provide useful work at this program junction. There exist other drawbacks to this approach as well.
In U.S. Pat. No. 5,019,967, issued to William R. Wheeler and George M. Uhler, assigned to Digital Equipment Corporation, a method of pipeline bubble compression in a computer system is disclosed. This method provides a way of compressing bubbles by overwriting a bubble when a stall condition is detected downstream of the bubble. This may be referred to as a bubble squash technique.