1. Field of the Invention
This invention relates in general to the field of instruction execution in computer systems, and more particularly to a method and apparatus for improving the performance of executing macro instructions having a result whose destination is in memory.
2. Description of the Related Art
A microprocessor is an integrated circuit that contains the entire central processing unit (CPU) of a computer on a single chip. It is the heart and brain inside every personal computer.
FIG. 1 illustrates the three major parts of a CPU 100. The register set 102 stores intermediate data used during execution of macro instructions. The arithmetic logic unit (ALU) 104 performs the required micro operations, or micro instructions, for executing the macro instructions. The control unit 106 supervises the transfer of information among the registers and instructs the ALU as to which operation to perform.
The CPU 100 is shown connected to a memory 108 via a bus 110. The memory 108 provides storage for programs to be executed on the CPU 100, as well as for the raw data that needs to be processed and the results of the processing. Also shown connected to the CPU 100 via the bus 110 is I/O control 112. The I/O control provides the physical interface between a user and the CPU 100.
The CPU 100 performs a variety of functions dictated by the type of macro instructions that are incorporated into the computer. Such macro instructions may include: data transfer instructions for moving data, addresses, and other operands into register or memory locations; arithmetic instructions (e.g., add, subtract, multiply, divide); branch instructions for controlling the sequence of instruction execution within the CPU 100; logic instructions (e.g., and, or, not, xor); and shift and rotate instructions. These instructions provide a language interface between a programmer and the CPU 100. They allow the programmer to command the CPU 100 to perform particular tasks, in a specified order. The type of instructions that may be executed on a particular CPU is termed the instruction set.
For convenience, instruction sets typically allow a programmer to define operations for execution on the CPU 100 at a higher level than is actually performed within the CPU 100. The instructions used by a programmer are thus called macro instructions. The macro instructions are retrieved by the CPU 100 from the memory 108, and are decoded by the control unit 106 into a sequence of micro instructions. The sequence of micro instructions are then provided to the ALU 104 for execution. The result of the execution may then be placed in either the register set 102, or may be stored in the memory 108.
One example of a macro instruction that may be written by a programmer is ADD [Mem],AX. This instruction tells the CPU 100 to take the data in a register AX 114 (within the register set 102), add the data to a value at memory location Mem 116 (within the memory 108), and store the result in the memory location Mem 116. While the instruction ADD [Mem],AX may be written as a single macro instruction by a programmer, the instruction must be decoded by the control unit 106 into a sequence of micro instructions for execution by the ALU 104. A micro instruction sequence that performs the operation requested by the macro instruction ADD [Mem],AX is shown below:
LOAD [Mem] PA1 ADD [Mem],AX PA1 STORE [Mem]
The above micro instruction sequence first loads the data that is stored in the memory 108, at location Mem 116, into a temporary register (not shown) within the register set 102 of the CPU 100. One skilled in the art is aware that the ALU 104 can operate only on data within the register set 102, and cannot perform direct operations on data outside of the CPU 100. The data in register AX 114 is then added to the data in the temporary register. The result of the addition is then transferred, or stored, back into the memory 108 at location Mem 116.
If it is assumed that each micro instruction within the CPU 100 requires a minimum of one clock cycle to execute, then from the above it should be clear that the CPU 100 would require at least three clock cycles to execute the ADD [Mem],AX instruction. In many processors that are manufactured today, more than three clock cycles would be required to execute the illustrative macro instruction because one or more of the micro instructions require more than one clock cycle to execute. As the number of micro instructions required to perform a macro instruction increase, and as the number of clock cycles required to perform each micro instruction increases, the time required to execute the macro instruction on the CPU 100 is increases.
The above illustration provides a general overview of the difference between macro instructions which are written by a programmer, and micro instructions which are executed within a microprocessor. More specifically, it is shown that a single macro instruction which has as one of its operands a location in memory generates a sequence of micro instructions which are executed within the microprocessor. Memory must first be accessed to load an operand, the operation must be performed, and memory must be accessed a second time to store the result. One skilled in the art should readily appreciate that this sequence of micro instructions is applicable, not just to an ADD instruction, but to many instructions that have an operand whose location is in memory, and where the destination of the result is also at the location in memory.