The Intel Architecture (IA-32) instruction set (also commonly referred to as the x86 architecture) includes a string (MOVS) macroinstruction (referred to in the Intel Software Developer's Manual as the “move data from string to string” instruction). The MOVS macroinstruction moves the byte (8-bit), word (16-bit), doubleword (32-bits), or quadword (64 bits) from a source memory location to a destination memory location. A repeat (REP) prefix may be added to the MOVS macroinstruction to repeat the MOVS macroinstruction multiple times to move a sequence of bytes, words, doublewords, or quadwords.
A program that includes a REP MOVS macroinstruction must also include a prior instruction that loads the ECX register with a count that specifies the number of times the MOVS macroinstruction is to be repeated. That is, the ECX register specifies the size of the string to be moved, i.e., the number of bytes, words, doublewords, quadwords that are to be moved from the source memory location to the destination memory location. Details of the MOVS macroinstruction are provided on pages 3-656 to 3-659 of the IA-32 Intel Architecture Software Developer's Manual, Volume 2A: Instruction Set Reference, A-M, and details of the REP prefix are provided on pages 4-211 to 4-215 of the IA-32 Intel Architecture Software Developer's Manual, Volume 2B: Instruction Set Reference, N-Z, all of which are hereby incorporated by reference in their entirety for all purposes.
Many modern microprocessors have an instruction translator that converts macroinstructions such as x86 macroinstructions into one or more microinstructions that execute within the microprocessor's microarchitecture. When the microprocessor has executed all of the constituent microinstructions it will have accomplished the semantics of the macroinstruction. The instruction translator itself generates a different microinstruction sequence for each of the more common macroinstructions to accomplish the macroinstruction. Additionally, a microinstruction ROM is coupled to the instruction translator. The microinstruction ROM stores microinstruction sequences that accomplish the semantics of more complex or less frequently occurring x86 macroinstructions, which reduces the complexity of the instruction translator. Thus, microinstruction sequences may be either generated by the instruction translator or output by the microinstruction ROM, depending on design criteria of the microprocessor.
Because string move instructions involve repetitive operations, they are accomplished by a sequence of microinstructions within the microinstruction ROM that execute in a loop. The count value in the ECX register determines the number of times the microinstructions in the loop are executed. The loop body decrements the ECX register value, and a conditional branch instruction at the end of the loop conditionally branches back to the top of the loop based on the current value in the ECX register. Loops are efficient in terms of number of microinstructions that must be stored. However, the execution performance of loops is relatively poor because the conditional branch instructions require a relatively large number of clock cycles to execute. Furthermore, if the microprocessor mispredicts the branch outcome, the penalty to recover from the misprediction is relatively large, particularly in deeply pipelined microprocessors. Additionally, the presence of a branch microinstruction may prevent the microprocessor from performing some microinstruction optimizations that it may otherwise be able to perform. Therefore, what is needed is a way to improve the performance of REP MOVS macroinstructions.