An instruction pipeline in a processor improves instruction execution throughput by processing instructions at multiple pipeline stages in which different instructions of an instruction stream can be executed in parallel. Such pipelines often include separate units for fetching, decoding, mapping, and executing instructions, and then writing results to another unit, such as a register. An instruction fetch unit of the pipeline provides a stream of instructions to the next stage of the processor pipeline. Instruction fetch units generally use an instruction cache in order to keep the rest of the pipeline continuously supplied with instructions.
A branch instruction in an instruction stream may result in a pipeline stall if the processor waits until the branch is resolved in an execution stage in the pipeline before fetching a next instruction in an instruction fetching stage. A branch predictor may attempt to predict whether a conditional branch will be taken or not taken. In some implementations, a branch predictor uses branch target prediction to predict a target of a taken conditional or unconditional branch before the branch instruction is computed by decoding and executing the branch instruction itself. A branch target may be based on an offset from a computed address or an indirect reference through a register.
A branch target buffer (BTB) conventionally is a single small memory cache in a processor that stores branch information including predicted branch targets. Prediction involves comparing an instruction address against previously executed instruction addresses that have been stored in the BTB. Prediction usually saves time in processing because successful prediction allows the processor to skip execution of steps for acquiring a target address. A processor saves time by looking up an address for a next step of execution in the BTB. Accordingly, a frequency with which a BTB generates a hit for the target address directly impacts the speed with which instructions can be executed by the processor. Often, the speed of execution is directly related to the number of entries a BTB can store.