One or more aspects of the invention relate to instruction grouping in a processor, and more particularly, to a method and apparatus for tracing instruction grouping.
Development in information technology has put higher and higher requirements on a processor's execution capability. To get higher processing capability, the processor has gradually developed from early in-order execution into out-of-order execution (or referred to as OoOE) that is often adopted today.
In the in-order execution scheme, a processor firstly prefetches and decodes an instruction, then reads operand(s) from memory according to the decoded instruction. If the operand(s) is/are currently available, the instruction is dispatched to an appropriate functional unit for execution. After the execution is completed, the functional unit writes execution results back into a register file. However, if one or more operands are unavailable (generally because the processor is prefetching these operands from memory) at current clock cycle, then the processor will stall until these operands are available.
Since the in-order execution scheme makes a processor waste some instruction clock cycles in waiting, many high performance processors adopt an out-of-order execution scheme to fully take advantage of these wasted clock cycles.
In particular, in the out-of-order execution scheme, instructions are firstly prefetched and arranged in an instruction sequence. When an instruction A in the sequence is temporarily unable to be executed due to its unavailable operand(s), the processor analyzes a subsequent instruction B. If the execution of instruction B is not dependent on the execution result of the previous instruction A, then the processor will send instruction B to an appropriate functional unit for execution. Results of execution are also written into a queue for restoring original instruction order.