Processors have evolved throughout recent decades by becoming smaller in size, more sophisticated in design and exhibiting faster performance. Such an evolution has resulted for various reasons, one of which is portability of systems incorporating processors. Portability introduces demands on processors such as smaller size, reduced power and efficient performance.
Applications of processors are, for example, in personal computers (PCs), workstations, networking equipment and portable devices. Examples of portable devices include laptops, which are portable PCs, and hand-held devices.
A processor (such as a microprocessor) processes instructions according to an instruction set architecture. Processing comprises fetching, decoding, and executing instructions. Some instruction set architectures define a programming model where fetching, decoding, executing, and any other functions for processing an instruction are apparently performed in strict order, beginning after the functions for all prior instructions have completed, and completing before any functions of a successor instruction has begun. Such an instruction set architecture provides a programming model where instructions are executed in program order.
Due to the wide use of code based on the x86 instruction set, particularly by software programmers who have become well accustomed to this instruction set and are not likely to readily adapt to another instruction set, backward compatibility of code is key in the architecture of a new processor. That is, the user of a newly-designed processor must enjoy the ability to use the same code utilized in a previous processor design without experiencing any problems.
In trace-based processor architectures, different trace types are used to significantly optimize execution by the back end, or execution unit, of the processor. Traces are generally built by the front end or trace unit (or instruction processing unit) of a processor, the instruction processing unit or trace unit performing certain function to build traces of operations, such as decoding.
Different types of traces might include a basic block trace, a multi-block trace or a microcode trace. A multi-block trace is made of one or more basic block traces, one or more multi-block traces or a combination thereof. A microcode trace is used when, for example, a sequence of instructions is either complex or rare. U.S. patent application Ser. No. 11/781,937, entitled “A Trace Unit with a Decoder, A Basic Block Builder, and A Multi-Block Builder” and filed on Jul. 23, 2007, the disclosure of which is incorporated herein by reference as though set forth in full, presents further details of such traces.
A trace, in some trace-based architecture, includes operations that do not correspond to instructions in the instructions' original program order. That is, knowledge of the original program order of the instructions is lost in a trace. Moreover, an instruction may result in multiple operations. Additionally, there are no instruction boundaries in a trace and the operations of a trace do not have clear relative age or order between each other (corresponding to the original instruction program order).
Some processors process instructions in various combinations of overlapped (or non-overlapped), parallel (or serial), and speculative (or non-speculative) manners, for example using pipelined functional units, superscalar issue, and out-of-order execution. Thus, some processors are enabled to execute instructions and access memory in an order that differs from the program order of the programming model. Nevertheless, the processors are constrained to produce results consistent with results that would be produced by processing instructions entirely in program order.
In some cases, executing instructions out-of-order is complex for memory-accessing instructions, such as load instructions and store instructions, because data dependencies due to dynamically computed addresses (such as register indirect accesses) require at least issue and partial execution of the memory-accessing instructions to evaluate the data dependencies.
In some cases, executing memory-accessing instructions out-of-order is expensive and/or complex due to mechanisms to maintain uncommitted results (of, for example, store instructions) in ways that enable forwarding (to, for example, load instructions). Some processors allow for a large number of outstanding out-of-order instructions, however, they have large, expensive, and slow associative data structures. Other processors use complicated techniques to enable forwarding of uncommitted results.
In some instruction set architectures, instructions are characterized as being either sequential or non-sequential, i.e. specifying a change in control flow (such as a branch). Processing after a sequential instruction implicitly continues with a next instruction that is contiguous with the sequential instruction, while processing after a change in control flow instruction optionally occurs with either the contiguous next instruction or with another next instruction (frequently non-contiguous) as specified by the control flow instruction.
As the size of such PCs decrease yet their speed increases, they require higher performance designs. Speculative execution of traces is then well suited however, it is desirable to further increase the performance of processors by more efficiently executing memory operations.
In light of the foregoing, there is a need for a processor to efficiently and speculatively execute traces and to efficiently execute memory operations to improve system performance and to do so using reduced hardware and operate using reduced power.