The need for increased prediction accuracy of branch instructions is well-known if the art of processor design. The need has grown even greater with the increase of processor pipeline lengths, cache memory latencies, and superscalar instruction issue widths. Branch instruction prediction involves predicting the target address and, in the case of a conditional branch instruction, the direction, i.e., taken or not taken.
Typically, instructions are fetched from an instruction cache in relatively large blocks, e.g., 16 bytes at a time. Consequently, multiple branch instructions may be present in the fetched block of instructions. There is a need to accurately predict the presence of the branch instructions in the fetched block and to predict both their target addresses and their directions. This is challenging because the location of the branch instructions within the block is relatively random. This is true with fixed-length instructions, but is particularly true with instruction set architectures that permit instructions to be variable length, e.g., x86 or ARM. For example, an x86 branch instruction may be located at any byte offset with the block of instruction bytes fetched from the instruction cache.