Programs consist of blocks or strings of sequential instructions, which have a single entry point (the first instruction) and a single exit point (the last instruction). There are one or two choices of instruction blocks to be executed after any particular block. When there are two possible blocks, a condition must be used to determine which block to choose. The pattern of links between blocks is called the program's control or flow graph.
These blocks of instructions are packed together in memory. When there is no choice of subsequent block (block B), it can normally be placed immediately after the first block (block A). This means that there need not be any explicit change in control to get from block A to block B. Sometimes this is not possible, for instance, if more than one block has block B as a successor. All but one of these predecessors must indicate that the subsequent block will not be the next sequential block, but block B. These are unconditional branches. Some blocks have a choice of successor blocks. Clearly only one of the successors, for example block B, can be placed sequentially afterwards. The other block, block C, is indicated explicitly within block A. A conditional mechanism is used to determine which block is to be chosen. If the condition is met, then the chosen successor block is block C. If the condition is not met, then the chosen successor is block B. These are conditional branches.
Branches are well known in the art and are essential for a computer system to execute any program. Known computer systems contain a special register, the instruction pointer register, which provides an indication of the address of the next instruction to execute. This register is usually automatically incremented after an instruction executes, so that it now indicates the address of the next sequential instruction. Branch instructions are used to change this behaviour. These branch instructions specify an alternative address (the target location) for the next executable instruction. Conditional branch instructions also specify a condition which must be met for the alternative address to be used—otherwise the instruction pointer will be incremented as usual. These branch instructions thus define the end of a block of instructions.
In a non-pipelined computer system, the computer fetches, decodes and executes to completion one instruction, before moving on to the next instruction. However, in a pipelined system where fetch, decode and execution stages can all operate simultaneously on a stream of instructions, it is possible to fetch instructions which are not required. For instance, consider a system with a four stage instruction pipeline with fetch, decode, execute and write stages. The earliest that a branch instruction can be detected is in the decode stage, by which time the next sequential instruction in memory will have already been fetched. For an unconditional branch this must be thrown away, and new instructions fetched from the target location. For conditional branches it is more complicated. The condition must be evaluated to determine whether or not to change to the target location. This will occur in the execute stage, thus the sequentially fetched instruction must be stalled in the fetch stage, and only after the branch has been executed can the pipeline proceed. If the condition was true, then the sequentially fetched instruction must be ignored, and new instructions fetched from the target location. The first pipelining applied to any processor architecture is to issue instructions in advance, as this is one of the easiest speed-ups.
From the previous description, it is clear that the instruction after a branch instruction is always fetched, but is only sometimes required, and that therefore a pipeline bubble is created while determining what to do. An attempt has been made to improve this by changing the semantics of branch instructions, so that the subsequent instruction is always executed and the branch determines whether the instruction executed after that one is the one sequentially after it, or the instruction at the target location. These are called delayed branches, and the instruction immediately following the branch instruction is called a branch delay slot. FIG. 1 illustrates schematically this operation. The branch instruction is detected in the decode stage. The branch delay slot is Inst 1, which is always executed. If the branch is taken, then the next executed instruction will be Inst D0 being the first instruction of a different block, whereas if the branch is not taken, it will be Inst 2 which is the first instruction of the next sequential block. Inst 1 must be an instruction which can always be executed, regardless of the outcome of the (conditional) branch, and it must not be an instruction which determines whether the conditional branch is to be taken. If no instruction can be found within the program which satisfies these conditions, then an instruction which has no effect (NO OP) must be inserted instead.
Pipelines can be designed where the optimum number of delay slots is more than one. The more deeply pipelined a computer is, the more delay slots are generally required. Unfortunately, it gets harder and harder to find useful instructions to put in each additional slot, so many of them are filled with instructions which do nothing. This places large bubbles of NO OP instructions in the execution pipeline, thus reducing the speed advantage obtained by making a deep pipeline.
Another significant problem with this approach is that when a new computer system of an existing instruction set is designed, with a new pipeline organization, and therefore a different number of branch delay slots, it cannot execute existing binaries. Programs must be recompiled in order to be executed.
In an attempt to dispense with branch delay slots, one known system uses two instruction fetchers at the fetch stage of a pipeline, each instruction fetcher being capable of fetching and holding a sequence of instructions. One instruction fetcher has associated with it local decode circuitry which is arranged to detect branch instructions. It will be appreciated that this local decode circuitry is in addition to the normal decode stage of the pipeline. When a branch instruction is detected by the active fetcher it initialises the other instruction fetcher to start fetching instructions from the new block while the instructions up to the branch instruction of the first block continue to be passed into the pipeline for decoding and execution. Not only does this system require extra local decode circuitry to detect branch instructions prior to the normal decode stage of the pipeline, but it also involves speculative fetching of instructions from the memory, many of which may not be required.
EP-A-355069 (Evans & Sutherland Computer Corporation) defines a system in which there is separation of the instruction to effect a branch into two different parts. The set branch instruction indicates the target location for the branch and can be placed as near the beginning of the string of instructions as possible.
Actual implementation of the branch is carried out later in response to a split bit located. in a later instruction.
The provision of the target location for the branch with the set branch instruction provides an early indication of the fact that a memory access is going to be made (or is likely to be made) and provides the memory address (the target location) for that access. When the split bit causes the branch to be taken, and the time comes therefore to access that memory address, the system has had a chance to set up for the access, for example by bringing the necessary data into a local cache.
One problem associated with the system of EP-A-355069 is that the target location from which new instructions are fetched is reset after a split bit signal has been executed. This means that there cannot be multiple branches using the target location set up by a single set branch instruction. It is advantageous to allow for this situation and it is one object of the present invention to provide an improved system for implementing branches allowing for this.