Data processing operations in a computer are typically carried out in a microprocessor. Generally, the microprocessor, which supervises and implements various data processing tasks for the computer, contains hardware components for processing instructions and data. Instructions together with data are typically stored in a computer memory subsystem, which may include Read Only Memory (ROM), Random Access Memory (RAM), hard disk drives, or other devices. The memory subsystem is typically physically separate from the microprocessor, although copies of instructions and data are temporarily stored on the microprocessor during program execution.
An instruction is a group of bits that tell the microprocessor to perform a specific operation. A part of an instruction is an operation code, or opcode. The opcode is a group of bits that specify an operation to be performed by the microprocessor. For example, operations such as adding, or subtracting, or branching program execution, or storing a value to memory may be specified in the opcode. The remainder of the instruction typically provides data sources for the operation, called operands. Operands may be specified within the instruction itself, in a register of the microprocessor, or in a memory location.
The architecture of a microprocessor includes the instruction set of the microprocessor as well as the set of resources, such as registers and memory address space, usable by the various instructions of the instruction set. Many modern microprocessors have both a macroarchitecture and a microarchitecture. In particular, many microprocessors that execute instructions specified by the Intel Architecture, which is also commonly referred to as the IA-32 or x86 architecture, have both a macroarchitecture and microarchitecture. The macroarchitecture is the user-visible architecture, i.e., the instruction set and resources that programmers may use. A macroinstruction is an instruction in the macroarchitecture instruction set. The macroarchitectures of some older popular processors, such as the x86 architecture, include very complex instructions. In contrast, the microarchitecture includes a microinstruction set and the set of resources usable by the various instructions of the microinstruction set. The microinstruction set typically includes much simpler instructions than the macroinstruction set and is typically not user-visible, although some microprocessors may make the microinstruction set as well as the macroinstruction set visible to the user. The execution units of the microprocessor actually execute microinstructions rather than macroinstructions.
The microprocessor includes an instruction translator that translates each macroinstruction into one or more microinstructions that are executed by the execution units, depending on the macroinstruction opcode and operands. The width of the instruction translator, i.e., the number of microinstructions the translator can generate per clock cycle, is a design decision that has competing interests. On the one hand, the narrower the instruction translator is, the smaller and potentially less complex it can be, which is beneficial in terms of cost, silicon real estate, speed, and thermal requirements. On the other hand, the wider the instruction translator the greater its ability to provide a sufficient rate of microinstructions to keep the execution units utilized, which is a concern in superscalar, out-of-order execution microprocessor designs.
Many modern microprocessors that have separate macroarchitectures and microarchitectures also include a microinstruction ROM in addition to the instruction translator. The microinstruction ROM is typically used to handle more complex and infrequently used macroinstructions that require a relatively large number of microinstructions to perform the operation specified by the associated macroinstruction. The microinstruction ROM includes sequences of microinstructions associated with individual macroinstructions. When the instruction translator encounters certain macroinstructions it transfers control to a microinstruction sequence in the microinstruction ROM rather than, or in addition to, generating microinstructions, and when the microinstruction sequence completes it transfers control back to the instruction translator. However, there may be a performance penalty associated with transferring control to a microcode ROM sequence relative to the instruction translator simply generating the microinstructions required to perform the associated macroinstruction operation, for example, bubbles may be introduced into the execution pipeline because the fetch unit is not supplying microinstructions at a sufficient rate to keep the execution units utilized. On the other hand, the width of the instruction translator limits the number of microinstructions it can generate each clock cycle, and the microinstruction ROM can be expanded economically to handle macroinstructions requiring more microinstructions than the instruction translator is designed to generate in a given clock cycle.
An example of a macroinstruction that requires a relatively large number of microinstructions is a macroinstruction that performs read/modify/write operations on an operand in memory. Macroinstructions that perform read/modify/write operations on an operand in memory are referred to herein as LdAluSt macroinstructions because they include a memory load operation to get the operand from memory into the microprocessor, an ALU operation to modify the memory operand, and a memory store operation to write the modified result back to its original location in memory. Each of the constituent load, ALU, and store operations may require one or more microinstructions to perform the respective operation. An example of a LdAluSt macroinstruction is an x86 ADD [mem], EAX instruction. This instruction loads the operand from the memory location specified by the [mem] address into the microprocessor, adds the memory operand to the value in the EAX register, and stores the resultant sum of the addition operation in the memory location specified by the [mem] address.
Each of the constituent load, ALU, and store operations of a LdAluSt macroinstruction may require one or more microinstructions to perform the respective operation. If the number of microinstructions that must be generated to perform the LdAluSt macroinstruction is greater than the width of the instruction translator, then the microinstruction ROM must be employed, or the instruction translator would have to generate the microinstructions over multiple clock cycles, which would likely significantly increase the complexity of the instruction translator. However, because LdAluSt macroinstructions are frequently used in many programs, it is desirable to avoid branching to a microinstruction ROM sequence to execute all or a portion of a LdAluSt macroinstruction.
Therefore, what is needed is a microprocessor that executes LdAluSt macroinstructions in a high performance manner and which includes a relatively fast, small, and low power consumption instruction translator.