A processor (such as a microprocessor) processes instructions according to an architecture of the processor, the instructions having a format defined by an instruction set architecture portion of the architecture. The processing comprises fetching, decoding, and executing the instructions. Some processors directly execute instructions, whereas other processors translate instructions into internal operations (sometimes called micro-operations) and execute operations that perform an equivalent function to the instructions. In processors translating instructions, conceptually the instructions are considered to be underlying the internal operations.
Some processor architectures comprise flags (sometimes called status flags) that monitor status of results associated with some instructions, and the flags also control aspects of execution of some instructions. For example, an instruction performs an add operation, modifying a carry flag to indicate whether there was a carry out from a result. A subsequent instruction performs an add-with-carry operation that uses the carry flag as a carry input to an addition calculation. In some instruction set architectures, additional flags indicate other types of status, such as whether a calculated result is negative, zero, or positive. In some instruction set architectures, branch instructions utilize a function of zero or more flags (sometimes called a condition) to determine program flow (whether to branch to a given instruction location, or to fall through to a following instruction). Some processors implement mechanisms to provide flags for an X86-compatible instruction set architecture (for example, see U.S. Pat. No. 5,632,023 issued to White, et al.). In the X86 architecture, flags include Z (zero), C (carry), N (negative), and O (overflow). The PowerPC architecture has multiple sets of flags, each set being a field of a condition register, each set including LT (negative), GT (positive), EQ (zero), and SO (summary overflow) indications.
Some instruction set architectures (such as an X86-compatible instruction set architecture) comprise complex instructions. Some microprocessor implementations comprise translation hardware to convert instructions (including complex instructions) into sequences of one or more relatively simpler operations, referred to as micro-operations. Additionally, certain implementations store sequences of micro-operations that correspond to one or more instructions in a cache, such as a trace cache. For example, Intel's Pentium 4 microprocessor, as described by Hinton, et al (in “The Microarchitecture of the Pentium 4 Processor”, Intel Technology Journal, Q1, 2001), has a trace cache.
Furthermore, it has been proposed to optimize micro-operations that correspond to a trace, such as by combining, reordering, or eliminating micro-operations. For example, see “Putting the Fill Unit to Work: Dynamic Optimizations for Trace Cache Microprocessors” by Friendly, et al, in Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture, pages 173-181.
Information regarding efficient implementation of processor execution logic to support ternary operations in place of pairs of binary operations (such as “X←A op1 B op2 C” instead of Z←A op1 B; X←Z op2 C″), may be found in “Proof of correctness of high-performance 3-1 interlock collapsing ALUs”, J. E. Phillips, et al., IBM Journal of Research and Development, Vol. 37, No. 1, January 1993, from http://www.research.ibm.com/journal/rd/371/ibmrd3701C.pdf. Additional information may also be found in “High Performance Execution Engines for Instruction Level Parallel Processors”, by J. E. Phillips, Technical Thesis, copyright 1996, from http://ce.et.tudelft.nl/publicationfiles/17—31_thesis.ps.
All of the foregoing patents and references are hereby incorporated by reference for all purposes.