In the case of conventional microprocessor devices, appropriate machine instructions are read out from a central memory and are then processed by an interpreter, are in particular interpreted into a sequence of executable micro operations.
To increase the performance of microprocessor devices, they may have a so-called “pipeline” structure: Here, one or a plurality of operations, e.g. the operation “interpret machine instruction”, may be subdivided into a plurality of partial steps (e.g. into the partial steps “Instruction Fetch”, and/or “Instruction Decoding”, and/or “Address Generation”, and/or “Operand Fetch”, and/or “Execute”, and/or “Store Result”, and/or “Update Program Counter”, etc., etc.), wherein each partial step is processed by a separate processing unit (“step”).
The individual processing units or steps may be connected with each other via appropriate latches, data and control paths such that a parallel, sequential processing of a plurality of different machine instructions is possible (wherein, at a particular time, each step in parallel processes respectively different instructions).
An “Instruction Fetch” step may, for instance, continually sequentially load machine instructions from the central memory which are stored in succession. In an “Instruction Decoding” step, a corresponding instruction taken from the “Instruction Fetch” step is decoded while the “Instruction Fetch” step already transfers the next instruction from the memory to the processor. An “Address Generation” step receives the instruction part and appropriate control information required for addressing (e.g. the type of addressing) from the “Instruction Decoding” step and calculates the operand addresses that are transmitted to an “Operand Fetch” step (while—in parallel—the “Instruction Fetch” step already loads a further machine instruction from the central memory, and, in the “Instruction Decoding” step, the instruction that has last been loaded by the “Instruction Fetch” step is decoded, etc., etc.)
The pipelining concept requires a special architecture that has to be designed such that the individual steps are adapted to actually process different instructions in parallel and, in so doing, influence each other as little as possible.
Since the “Instruction Fetch” step fetches—as explained above—the instructions as a rule sequentially from the central memory, irrelevant instructions that are not to be executed at the moment may be loaded into the pipeline in the case of (conditional or unconditional) jump instructions or branches and/or subroutine calls, which results in substantial performance loss.
Unconditional jumps (and subroutine calls) may be detected early by means of specific control mechanisms, whereupon there may be induced that the “Instruction Fetch” step continues at the new position in the program even before the PC (Program Counter) has been modified correspondingly by the “Update Program Counter” step.
Contrary to that, the target of conditional jumps can be detected only on evaluation of the respective condition, e.g. only by the “Execute” step. Possibly, the entire content of the pipeline may have to be rejected then—in correspondence with the respective condition.
This may, for instance, be prevented in that the pipeline mechanism is stopped as soon as the “Instruction Decoding” step recognizes a conditional jump instruction. The pipeline is released only if the target address of the jump has been determined, or if the PC has been updated, respectively. This procedure that results in a “gap” in the pipeline and thus in a performance loss is referred to as “Interlocking”.
In order to avoid or minimize the performance loss occurring with an “Interlocking”, so-called—static or dynamic—branch prediction methods may be used.
In so doing, one tries to predict whether a conditional jump is likely to be taken (prediction: “taken”), or not prediction: “not taken”).
A simple example are jump operation instructions whose target is indicated relative to the PC (i.e. in which an addressing related to the program counter is used): If the displacement is negative in the case of such jump instructions (“backward” jump), it may be assumed that it is the matter of a loop end. Since a loop is more likely run through than left, it may thus be assumed as a prediction for such a jump instruction that the jump will be taken (prediction: “taken”).
If it is predicted that a jump will be taken, it can be induced that the “Instruction Fetch” step breaks through the sequential order during the loading of the machine instructions from the central memory.
To increase the probability of hitting of the predictions, a table may be used in which the target addresses that have been calculated last are entered for as many jumps as possible which have already been taken in a program (so-called jump target cache). The table is managed by the processor and comprises the “history” of the instructions (“Branch History”).
Different static and dynamic branch prediction methods are, for instance, described in the book “Computer Architecture: A Quantitative Approach” by Hennessy and Patterson.