In modern computer architectures, trace cache is often used to attempt to reduce branch penalty cycles caused by a mis-prediction of branch instructions and to de-couple the latency associated with any unconditional jumps.
Trace cache is typically used to store traces of instructions to be executed, one after another, in a pipelined instruction execution architecture. Different traces may correspond to different possible sequences of instructions that may or may not be executed depending on conditional outcomes of certain instructions such as branch instructions or outcomes of unconditional jump instructions.
A branch instruction is a computer instruction that may have two possible outcomes. The two outcomes are branch or don't branch. When the result of a branch instruction is to branch, then the instruction architecture abandons the current instruction sequence and branches to a different instruction sequence. When the result is not to branch, the instruction architecture stays on the same instruction sequence path.
In the case of an unconditional jump instruction, when the jump instruction is executed, the instruction architecture always jumps to the new instruction sequence associated with the jump instruction.
In either case, conditional branch or unconditional jump, delays may be encountered in the instruction execution pipeline if the computer architecture must go further back in the instruction pipeline to access the next sequence of instructions to branch to or jump to. These delays effectively cause stalls in the instruction execution pipeline while the instruction execution pipeline waits for the next correct instruction to be loaded into its instruction register.
In a typical instruction pipeline within a computer architecture, an instruction cache grabs computer instructions from a memory. The instruction cache may feed individual instructions into an instruction register or may feed a trace cache to build up traces of instructions within the trace cache. Once an instruction is loaded into an instruction register, it is decoded and executed using associated data loaded into a data register for that instruction. The result of the executed instruction is written back to a register.
A typical instruction pipeline in a computer architecture may use a branch predictor to attempt to predict the outcome of a branch instruction based on a trace history built up in trace cache. Prediction accuracies of 90% or better may be achieved. However, for those instances when the branch prediction is incorrect, additional delays and stalls may be experienced.
Research with respect to trace cache has focused on various implementation details such as how to construct continuous traces, using single or multiple branch predictors to improve the trace cache performance, and filling algorithms for loading the trace cache. Also, instead of constructing multiple branch predictors, multiple branches of traces may be constructed in trace cache.
The circuit delay associated with a branch mis-prediction may then be reduced by going back only to trace cache and accessing the correct trace instead of suffering additional delays by having to go back to the instruction cache or the memory. The only delay suffered is then just that associated with the mis-prediction into trace cache. Therefore, by constructing parallel branches of traces in trace cache, the circuit delay from making the branch decision to the instruction fetch may be reduced. However, instruction execution pipeline stalls may still occur with such a configuration (when the branch prediction is incorrect).
It is desirable to further reduce the chance of delays and stalls occurring in the instruction execution pipeline of a computer architecture. It is also desirable to eliminate the branch predictor altogether such that only correct instructions/traces are loaded into the instruction execution pipeline.
Further limitations and disadvantages of conventional and traditional approaches will become apparent to one of skill in the art, through comparison of such systems with embodiments of the present invention as set forth in the remainder of the present application with reference to the drawings.