1. Field of the Invention
This invention relates to microprocessors and, more particularly, to loop control optimization of microcoded instructions.
2. Description of the Related Art
Computer system processors that employ the x86 architecture include certain instructions within the x86 instruction set that are quite complex, specifying multiple operations to be performed. For example, the PUSHA instruction specifies that each of the x86 registers be pushed onto a stack defined by the value in the ESP register. The corresponding operations are a store operation for each register, and decrements of the ESP register between each store operation to generate the address for the next store operation. Often, complex instructions are classified as MROM instructions. MROM instructions are transmitted to a microcode instruction unit, or MROM unit, within the microprocessor, which decodes the complex MROM instruction and dispatches two or more simpler fast-path instructions for execution by the microprocessor. The simpler fast-path instructions corresponding to the MROM instruction are typically stored in a read-only memory (ROM) within the microcode instruction unit. The microcode instruction unit determines an address within the ROM at which the simpler fast-path instructions are stored, and transfers the fast-path instructions out of the ROM beginning at that address. Multiple clock cycles may be used to transfer the entire set of fast-path instructions corresponding to the MROM instruction. The entire set of fast-path instructions that effect the function of an MROM instruction is called a microcode sequence. Each MROM instruction may correspond to a particular number of fast-path instructions dissimilar from the number of fast-path instructions corresponding to other MROM instructions. Additionally, the number of fast-path instructions corresponding to a particular MROM instruction may vary according to the addressing mode of the instruction, the operand values, and/or the options included with the instruction. The microcode unit issues the fast-path instructions into the instruction-processing pipeline of the microprocessor. The fast-path instructions are thereafter executed in a similar fashion to other instructions. It is noted that the fast-path instructions may be instructions defined within the instruction set, or may be custom instructions defined for the particular microprocessor.
Conversely, less complex instructions are decoded by hardware decode units within the microprocessor, without intervention by the microcode unit. The terms “directly-decoded instruction” and “fast-path instruction” will be used herein to refer to instructions which are decoded and executed by the microprocessor without the aid of a microcode unit. As opposed to MROM instructions which are reduced to simpler instructions which may be handled by the microprocessor, fast-path instructions are decoded and executed via hardware decode and functional units included within the microprocessor.
Fast-path instructions that implement an MROM instruction may include branch instructions. For example, a string instruction may include a loop of instructions. A microcode loop is one or more instructions that are repetitively executed a specific number of times. The specific number of iterations is called a loop count or string count. A microcode loop typically includes a branch instruction and a decrement instruction. With each iteration of the loop, the string count is decremented and a branch instruction tests the string count for a termination condition. If the termination condition is false, the branch instruction branches to the top of the loop and the instructions of the microcode loop are executed again. Termination conditions may include the string count being equal to zero and a flag being asserted or unasserted.
Computer system processors that employ the x86 architecture also include string instructions designed to allow data structures, such as alphanumeric character strings, for example, to be moved to and from memory. Examples of string instructions in the x86 architecture are MOVS (move string) and CMPS (compare string). The MOVS instruction loads data from a memory location specified by index register ESI, increments/decrements ESI, stores the loaded data to a memory location specified by EDI and increments/decrements EDI. When executed, the string instructions described above may perform a single iteration.
The string count or count value determines the number of iterations to perform the string instruction. If longer strings or groups of data must be transferred, a “repeat” string instruction may be used. In such instructions, the repeat prefix may create a repeating string instruction that iterates a number of times. The number of iterations may be controlled by a string count or count value. Typically, the ECX register (or the rCX register in 64-bit machines) stores the number of iterations to repeat the string instruction. Accordingly, each iteration of MOVS register ECX may be decremented and a termination condition is tested. A direction flag (DF) indicates whether the index registers (ESI and EDI) are incremented or decremented. By incrementing/decrementing the index registers, the string instruction operates on a series of sequential data. For example, MOVS can move a block of data from one memory location to another memory location. The size of the block is determined by the string count stored in register ECX.
The repeat string instructions are microcoded instructions. Thus, when a repeat sting instruction is executed, the microcode sequence controller may dispatch microinstructions that implement the functionality of the x86 REP instruction. This may be accomplished by using a loop of microcode instructions including a single microcode entrypoint. The microcode sequencing hardware may place the MROM unit into a continuous unconditional loop such that the microcode sequence controller may continuously dispatch microcode instructions to the execution core until a termination condition indication is received from the execution core. One or more of the microcode instructions may test the termination condition of the loop. As described above, the termination condition may be based on the value of the ECX register and possibly the state of the zero flag, depending on the type of repeat prefix used. The ECX value may be decremented each iteration by one of the microcode instructions. However, by the time the termination indication is received, multiple excess microcode instructions may have been dispatched that will not be executed. The excess microcode instructions must be cancelled and flushed from the instruction pipeline; thereby causing a branch-misprediction penalty. If REP prefixes are used frequently, the branch-misprediction penalties may be significant.