Programs consist of blocks or strings of sequential instructions, which have a single entry point (the first instruction) and a single exit point (the last instruction). There can be a choice from a number of instruction blocks to be executed after any particular block. When there is more than one possible block, a condition must be used to determine which block to choose. The pattern of links between blocks is called the program's control or flow graph.
These blocks of instructions are packed together in memory. When there is no choice of subsequent block (block B), it can normally be placed immediately after the first block (block A). This means that there need not be any explicit change in control to get from block A to block B. Sometimes this is not possible, for instance, if more than one block has block B as a successor. All but one of these predecessors must indicate that the subsequent block will not be the next sequential block, but block B. These are unconditional branches. Some blocks have a choice of successor blocks. Clearly only one of the successors, for example block B, can be placed sequentially afterwards. The other block, block C, is indicated explicitly within block A. A conditional mechanism is used to determine which block is to be chosen. If the condition is met, then the chosen successor block is block C. If the condition is not met, then the chosen successor is block B. These are conditional branches.
Branches are well known in the art and are essential for a computer system to execute any program. Known computer systems contain a special register, the instruction pointer register, which provides an indication of the address of the next instruction to execute. This register is usually automatically incremented after an instruction executes, so that it now indicates the address of the next sequential instruction. Branch instructions are used to change this behaviour. These branch instructions specify an alternative address (the target location) for the next executable instruction. Conditional branch instructions also specify a condition which must be met for the alternative address to be used—otherwise the instruction pointer will be incremented as usual. These branch instructions thus define the end of a block of instructions.
In a non-pipelined computer system, the computer fetches, decodes and executes to completion one instruction, before moving on to the next instruction. However, in a pipelined system where fetch, decode and execution stages can all operate simultaneously on a stream of instructions, it is possible to fetch instructions which are not required. For instance, consider a system with a four stage instruction pipeline with fetch, decode, execute and write stages. The earliest that a branch instruction can be detected is in the decode stage, by which time the next sequential instruction in memory will have already been fetched. For an unconditional branch this must be thrown away, and new instructions fetched from the target location. For conditional branches it is more complicated. The condition must be evaluated to determine whether or not to change to the target location. This will occur in the execute stage, thus the sequentially fetched instruction must be stalled in the fetch stage, and only after the branch has been executed can the pipeline proceed. If the condition was true, then the sequentially fetched instruction must be ignored, and new instructions fetched from the target location. The first pipelining applied to any processor architecture is to issue instructions in advance, as this is one of the easiest speed-ups.
From the previous description, it is clear that the instruction after a branch instruction is always fetched, but is only sometimes required, and that therefore a pipeline bubble is created while determining what to do.
A branching architecture is known for example from EP-A-689131 wherein a branch is effected by the use of two separate instructions, a prepare to branch (PT) instruction (sometimes referred to herein as a set branch instruction) and an execute branch instruction (sometimes referred to herein as the effect branch instruction). The set branch instruction loads the destination address for the branch (referred to herein as the target address) into a target register. The effect branch instruction causes the processor control to transfer to the target address contained in the target register.
In a processor which comprises a program memory, instruction fetch circuitry and an execution unit, the transfer of the processor control can be handled in a number of ways. In one arrangement, two instruction fetch paths are provided, one providing instructions from the instant instruction sequence and the other providing instructions from the target address loaded by the branch set-up instruction. When the branch is effected at the effect branch instruction, the instructions loaded from the target address are switched over to supply the execution unit in place of those from the instant instruction sequence. Other implementations are possible and are discussed for example in the above-referenced EP-A-689131.
The advantage of such a so-called “split branch” arrangement is that it allows the set branch instruction to be moved earlier in the instruction stream. This means that the processor is informed of the branch destination (target address) sooner, and so is able to preload instructions starting from that target address so that by the time the effect branch instruction is taken, the instructions at the target are available to be executed. This is particularly useful in a pipelined architecture to avoid pipeline stalls which would otherwise occur while addresses were being fetched from a target address for a branch.
However, the effectiveness of implementation of the split-branch mechanism depends upon a compiler of the program to locate the set branch instructions at the best place in the instruction stream. There are a number of aims to optimise the placement of the set branch instructions.    1. In general, the earlier the set branch instructions are in the instruction stream, the more opportunity there is for the processor to preload branch target instructions, thus avoiding pipeline bubbles and speeding up execution.    2. For repeatedly executed branch instructions, such as those inside loops, it is possible to pull the set branch instruction completely outside of the loop. This reduces the number of times that they are executed and so improves code speed.    3. Branches which share the same destination address can share a target register, so only one per set branch instruction is necessary to set up the target register. This improves both code speed and code size.
However, there is a trade-off. Pulling the set branch instructions very early in an instruction stream may mean they are moved to a place where they are executed unnecessarily, because the effect branch instruction is never reached. That is, that particular branch is never taken because, for example, of intervening branches or conditions.
Also, the further the set branch instructions are from the branch instructions proper, the greater the pressure there is if there is a limited number of target registers in the processor. To utilise a limited number of target registers, which is sometimes a constrained resource in processors, it is necessary to reduce the distance between the set branch instruction and the effect branch instruction as far as possible in the instruction stream.
It is also important to make sure that a target register which has been loaded with a target address by a set branch instruction is not overwritten when the program is executed until the corresponding effect branch instruction has used the target address.