The present invention provides a new trace assembly paradigm for processing circuits.
FIG. 1 is a block diagram illustrating the process of program execution in a conventional processor. Program execution may include three stages: front end 110, execution 120 and memory 130. The front-end stage 110 performs instruction pre-processing. Front end processing is designed with the goal of supplying valid decoded instructions to an execution core with low latency and high bandwidth. Front-end processing can include instruction prediction, decoding and renaming.
As the name implies, the execution stage 120 performs instruction execution. The execution stage 120 typically communicates with a memory 130 to operate upon data stored therein.
The front end stage 110 may include a trace cache (not shown) to reduce the latency of instruction decoding and to increase front end bandwidth. A trace cache is a circuit that assembles sequences of dynamically executed instructions into logical units called “traces.” The program instructions may have been assembled into a trace from non-contiguous regions of an external memory space but, when they are assembled in a trace, the instructions appear in program order. Typically, a trace may begin with an instruction of any type. The trace may end when one of number of predetermined trace end conditions occurs, such as a trace size limit, a maximum number of conditional branches occurs or an indirect branch or return instruction occurs.
Prior art traces are defined by an architecture having a single entry point but possibly many exit points. This architecture, however, causes traces to exhibit instruction redundancy. Consider, by way of example, the following code sequence:
If (cond)                A        
B
This simple code sequence produces two possible traces: 1) B and 2) AB. When assembled in a trace cache, each trace may be stored independently of the other. Thus, the B instruction would be stored twice in the trace cache. This redundancy lowers system performance. Further, because traces may start on any instruction, the B instruction also may be recorded in multiple traces due to instruction alignment discrepancies that may occur. This instruction redundancy limits the bandwidth of front-end processing.
Accordingly, there is a need in the art for a front-end stage that reduces instruction redundancy in traces.