Processors have evolved throughout recent decades by becoming smaller in size, more sophisticated in design and exhibiting faster performance. Such an evolution has resulted for various reasons, one of which is portability of systems incorporating processors. Portability introduces demands on processors such as smaller size, reduced power and efficient performance.
A processor (such as a microprocessor) processes instructions according to an instruction set architecture. The processing comprises fetching, decoding, and executing the instructions. Some instruction set architectures define a programming model where fetching, decoding, executing, and any other functions for processing an instruction are apparently performed in strict order, beginning after the functions for all prior instructions have completed, and completing before any functions of a successor instruction has begun. Such an instruction set architecture provides a programming model where instructions are executed in program order.
Some processors process instructions in various combinations of overlapped (or non-overlapped), parallel (or serial), and speculative (or non-speculative) manners, for example using pipelined functional units, superscalar issue, and out-of-order execution. Thus, some processors are enabled to execute instructions and access memory in an order that differs from the program order of the programming model. Nevertheless, the processors are constrained to produce results consistent with results that would be produced by processing instructions entirely in program order.
In some instruction set architectures, instructions are characterized as being either sequential or non-sequential, i.e. specifying a change in control flow (such as a branch). Processing after a sequential instruction implicitly continues with a next instruction that is contiguous with the sequential instruction, while processing after a change in control flow instruction optionally occurs with either the contiguous next instruction or with another next instruction (frequently non-contiguous) as specified by the control flow instruction.
Applications of processors are, for example, in personal computers (PCs), workstations, networking equipment and portable devices. Examples of portable devices include laptops, which are portable PCs, and hand-held devices.
Due to the wide use of code based on the x87 instruction set, particularly by software programmers who have become well accustomed to this code and are not likely to readily adapt to another code, backward compatibility of code is key in the architecture of a new processor. That is, the user of a newly-designed processor must enjoy the ability to use the same code utilized in a previous processor design without experiencing any problems.
In trace-based processor architectures, different trace types are used to significantly optimize execution by the back end, or execution unit, of the processor. Traces are generally built by the front end or trace unit (or instruction processing unit) of a processor.
Different types of traces might include a basic block trace, a multi-block trace or a microcode trace. A multi-block trace is made of one or more basic block traces, one or more multi-block traces or a combination thereof A microcode trace is used when, for example, a sequence of instructions is either complex or rare. U.S. patent application Ser. No. 11/781,937, entitled “A Trace Unit with a Decoder, A Basic Block Builder, and A Multi-Block Builder” and filed on Jul. 23, 2007, the disclosure of which is incorporated by reference as though set forth in full, presents further details of such traces.
A trace, in some trace-based architectures, includes operations that do not correspond to instructions in the instructions' original program order. That is, knowledge of the original program order of the instructions is lost in a trace. Moreover, an instruction may result in multiple operations. Additionally, there is no instruction boundary and the operations of a trace do not have clear relative age or order between each other (corresponding to the original instruction program order).
In prior art techniques, when a problem with a trace is detected, because there is a correspondence between instructions and operations, the relative age of the operation with the problem is used to roll back the architectural state of the processor to that which it was prior to the abort. However, in a trace-based architecture, where there is no correspondence between the instructions and corresponding operations and there is no clear instruction boundary, the problem cannot be resolved using the age of the operation because the order of the operations do not represent the original program order.
In prior art processors, when an abort is encountered, only the relative age of the pending abort is considered, however, where traces include operations that do not represent the original program order, simply considering the relative age of the pending abort falls short of resolving aborts effectively. In trace-based architectures, if an abort, or a problem, applies to more than one operation representing two or more instructions in the same trace, currently, the abort is not handled efficiently, as there is no clear operation-to-instruction or instruction order correspondences or instruction boundary.
In the foregoing trace-based architectures, a trace can experience multiple abort triggers (due to problems with different instructions contained within the trace) while traditional non-traced based processors will only recognize a single abort trigger for a single instruction.
In light of the foregoing, there is a need for a trace-based processor having a trace unit (or front end) and an execution unit (or back end) for efficiently managing problems (or aborts) related to one or more traces while minimizing performance impact.