Many contemporary microprocessors include a micro-architecture that is distinct from their architecture, or macroarchitecture. One characteristic of such a microprocessor is that it includes an instruction translator that translates macroinstructions (e.g., x86 instructions) of the microprocessor's instruction set architecture into one or more microinstructions, or micro-operations, of the micro-architecture instruction set. When the instruction translator encounters a macroinstruction that must be translated into more micro-operations than the instruction translator can generate per clock cycle, the instruction translator generates a prolog of micro-operations. The remainder of the instructions to implement the macroinstruction is fetched from a microcode Read-Only-Memory (ROM). The sequence of instructions fetched from the microcode ROM is referred to herein as the “microcode tail.” The micro-operations of the prolog generated by the translator can be customized for the form of the instruction. The most common customization is to generate a different prolog for a memory form of a macroinstruction versus a register form of the macroinstruction. For a memory-based form, the translator generates a load instruction to load the source operand into a temporary register of the microprocessor; whereas, for a register-based form, the translator generates a move instruction to move of the source register to the temporary register. The problem is in the microcode tail. For the memory form, a store micro-operation is needed to store the result to memory; whereas, for the register form, the result needs to be moved to the destination register.
Normally, the microcode tail would include a conditional branch to go to either a tail for the register-based form or a tail for the memory-based form. However, conditional branch instructions can be costly to performance.