1. Field of the Invention
This invention relates in general to the field of instruction execution in computer systems, and more particularly to a method and apparatus for improving the performance of repeat string operations.
2. Description of the Related Art
Byte manipulation and string manipulation have always been important in computer processing. A primary application is in the area of text processing, which is the management of sequences of bytes that contain the alphanumeric codes for characters, i.e., character strings. In text processing it is essential to have program sequences for moving and comparing character strings, and for inserting strings into and deleting them from other strings. Moreover, it is often necessary to search a string for a given substring or to replace a substring with a different substring. Other applications requiring string manipulation include array processing, code conversion, and searching for keys in a file system.
To better understand string manipulation in microprocessors, the discussion below will employ the nomenclature of an x86 microprocessor. However, those skilled in the art will appreciate that use of x86 registers and macro instructions is for illustrative purposes only. Other processors or architectures may be easily substituted for this illustration.
String operations are used in microprocessors to move data from one location, the source address, to another location, the destination address. An x86 microprocessor provides a number of registers which are used to calculate: 1) the address of a byte or word which will be manipulated, i.e., the source address; 2) the address of a byte or word to which the source string will be moved, i.e., the destination address; and 3) the number of times the string operation must be repeated to manipulate the entire string. In a protected mode memory model, the source address for a string is found by adding the contents of the data segment base register DS with that of the source index register SI. The destination address for a string is found by adding the contents of the extra segment base register ES to the contents of the destination index register DI. Once a string operation is performed at a first source/destination address, the contents of SI and DI can be incremented or decremented, as specified by the programmer, and the operation repeated. By placing the string operation and increment/decrement steps within a loop, an entire string can be manipulated or transferred. The number of times the string instruction must be repeated is stored in general purpose architectural count register CX.
In an x86 microprocessor, all repeat string operations (i.e. REP INS, REP MOVS, REP OUTS, REP LODS, REPSTOS, REPE CMPS, REPA SCAS AND REPNE SCAS) repeat a specified string instruction a number of times equal to the number in the architectural count register ECX or until the indicated condition of the zero flag register (ZF) is no longer met. To begin a repeat string operation, the contents of register ECX are first loaded into a temporary count register (CNT). After each successful iteration of the string operation, temporary count register (CNT) is decremented. When the value in the CNT register reaches zero, or when the indicated ZF condition is met, the architectural count register ECX is updated with the contents of the temporary count register.
X86 microprocessors includes a translate stage which converts the repeat string macro instruction to a sequence of micro instructions. This sequence includes a count initialization instruction (LD CNT, ECX) followed by a subsequence of micro instructions that direct the microprocessor to perform the first iteration of the prescribed string operation. The translate stage then continues to repeatedly generate the same subsequence of micro instructions until execution logic in a later pipeline stage signals the translate stage that either 1) the number of generated subsequences is equal to the number of required interations, 2) that the prescribed ZF condition is no longer met, or 3) that an exception has occurred. The translate stage then generates an exit subsequence of micro instructions that directs the microprocessor to update architectural count register ECX with the final value of CNT upon exit.
Unfortunately, one significant disadvantage of the above method for performing repeat string operations is that the first micro instruction (LD CNT, ECX) generated by the translate stage is pure overhead. Although the first micro instruction is required to perform one or many string operations, its existence adds delay to the microprocessor pipeline and thus causes inefficiency.
Therefore, what is needed is a technique for operating a microprocessor that reduces or eliminates the delay associated with the first micro instruction that initiates the execution of a repeat string instruction.