1. Field of the Invention
This invention relates to the field of processors and, more particularly, to microcode instruction mechanisms within processors and the generation of entry points to microcode memory in processors.
2. Description of the Related Art
Superscalar processors achieve high performance by executing multiple instructions per clock cycle and by choosing the shortest possible clock cycle consistent with the design. As used herein, the term “clock cycle” refers to an interval of time accorded to various stages of an instruction processing pipeline within the processor. Storage devices (e.g. registers and arrays) capture their values according to the clock cycle. For example, a storage device may capture a value according to a rising or falling edge of a clock signal defining the clock cycle. The storage device then stores the value until the subsequent rising or falling edge of the clock signal, respectively. The term “instruction processing pipeline” is used herein to refer to the logic circuits employed to process instructions in a pipelined fashion. Although the pipeline may be divided into any number of stages at which portions of instruction processing are performed, instruction processing generally comprises fetching the instruction, decoding the instruction, executing the instruction, and storing the execution results in the destination identified by the instruction.
Less complex instructions are typically directly decoded by hardware decode units within the processor. Often, complex instructions are classified as microcoded instructions. Microcoded instructions are transmitted to a microcode instruction unit within the microprocessor, which decodes the complex microcoded instruction and produces two or more simpler microcode instructions for execution by the microprocessor. The simpler microcode instructions corresponding to the microcoded instruction are typically stored in a read-only memory (ROM) within the microcode unit. Thus, microcoded instructions are often referred to as MROM instructions. The terms “directly-decoded instruction” or “fastpath instruction” or “non-complex instruction” may be used interchangeably herein to refer to instructions which are decoded and executed by the processor without the aid of a microcode instruction unit. As opposed to MROM instructions which are reduced to simpler instructions which may be handled by the microprocessor, directly-decoded instructions are decoded and executed via hardware decode and functional units included within the microprocessor.
Instructions processed in a processor are encoded as a sequence of ones and zeros. For some processor architectures, instructions may be encoded in a fixed length, such as a certain number of bytes. For other architectures, such as the x86 architecture, the length of instructions may vary. The x86 microprocessor architecture is one example of an architecture having complex instructions that may be implemented in microcode. The x86 microprocessor architecture specifies a variable length instruction set (i.e. an instruction set in which various instructions employ differing numbers of bytes to specify that instruction). For example, the 80386 and later versions of x86 microprocessors employ between 1 and 15 bytes to specify a particular instruction. Instructions have an opcode, which may be 1–2 bytes, and additional bytes may be added to specify addressing modes, operands, and additional details regarding the instruction to be executed.
A generic format illustrative of the x86 instruction set is shown in FIG. 1A. As illustrated in the figure, an x86 instruction may include from one to four optional prefix bytes, followed by an operation code (opcode) field, an optional addressing mode (Mod R/M) byte, an optional scale-index-base (SIB) byte, an optional displacement field, and an optional immediate data field.
The opcode field defines the basic operation for a particular instruction. The default operation of a particular opcode may be modified by one or more prefix bytes. For example, a prefix byte may be used to change the address or operand size for an instruction, to override the default segment used in memory addressing, to instruct the processor to repeat a string operation a number of times, or to specify a different basic operation. The prefix bytes may contain one or more prefix byte codes. The opcode field follows the prefix bytes, if any, and may be one or two bytes in length. The addressing mode (ModR/M) byte specifies the registers used as well as memory addressing modes. The scale-index-base (SIB) byte is used only in 32-bit base-relative addressing using scale and index factors. A base field of the SIB byte specifies which register contains the base value for the address calculation, and an index field specifies which register contains the index value. A scale field specifies the power of two by which the index value will be multiplied before being added, along with any displacement, to the base value. The next instruction field is the optional displacement field, which may be from one to four bytes in length. The displacement field contains a constant used in address calculations. The optional immediate field, which may also be from one to four bytes in length, contains a constant used as an instruction operand.
Referring now to FIG. 1B, several different variable byte-length x86 instruction formats are shown. The shortest x86 instruction is only one byte long, and comprises a single opcode byte as shown in format (a). For certain instructions, the byte containing the opcode field also contains a register field as shown in formats (b), (c) and (e). Format (j) shows an instruction with two opcode bytes. An optional ModR/M byte follows opcode bytes in formats (d), (f), (h), and (j). Immediate data follows opcode bytes in formats (e), (g), (i), and (k), and follows a ModR/M byte in formats (f) and (h). FIG. 1C illustrates several possible addressing mode formats (a)–(h). Formats (c), (d), (e), (g), and (h) contain ModR/M bytes with offset (i.e., displacement) information. An SIB byte is used in formats (f), (g), and (h).
Certain instructions within the x86 instruction set are quite complex, specifying multiple operations to be performed. For example, the PUSHA instruction specifies that each of the x86 registers be pushed onto a stack defined by the value in the ESP register. The corresponding operations are a store operation for each register, and decrements of the ESP register between each store operation to generate the address for the next store operation.
Different instructions may require differing numbers of microcode instructions to effectuate their corresponding functions. Additionally, the number of microcode instructions corresponding to a particular MROM instruction may vary according to the addressing mode of the instruction, the operand values, and/or the options included with the instruction. The microcode instruction unit issues the microcode instructions into the instruction processing pipeline of the microprocessor. The microcode instructions are thereafter executed in a similar fashion to other instructions. It is noted that the microcode instructions may be instructions defined within the instruction set, or may be custom instructions defined for the particular microprocessor.
A processor may decode or partially decode an instruction encoding to determine if an instruction is a fastpath instruction or an MROM instruction. If the instruction is an MROM instruction, the processor's microcode instruction unit determines an address within the processor's microcode ROM at which the microcode instructions are stored. The microcode routines to implement MROM instructions are typically stored in a sequentially addressed ROM. Typically, the microcode instruction unit maps or translates some or all of the instruction encoding to a microcode ROM address for a location the microcode ROM at which the corresponding microcode routine begins. This mapping may be performed by a lookup table, content-addressable memory, combinatorial logic, or any other mechanism for translating the MROM instruction encoding to a ROM address. For example, microcode may be stored in a 3K ROM. The microcode unit may map an MROM instruction encoding to a 12-bit ROM address in the range 0x000–0xBFF according to where the beginning of the microcode routine for that MROM instruction is located. The ROM address is sent to an address decoder for the ROM which selects the addressed ROM entry. The microcode instruction at the selected ROM entry is transferred out of the ROM to be executed. The ROM address may be incremented to the next microcode instruction in the routine. Also, some microcode instructions may indicate a jump to a non-sequential address in the microcode ROM. Multiple clock cycles may be used to transfer the entire set of instructions within the ROM that correspond to the MROM instruction.
The process of determining the address in a microcode ROM to begin execution of a microcode routine to implement an MROM instruction is referred to as microcode entry point generation. As discussed above, microcode entry point generation involves mapping an MROM instruction encoding to a microcode ROM address. At higher clock frequencies, this mapping process may be difficult to complete in one clock cycle. Thus, microcode entry point generation may introduce stalls in the processing pipeline if additional clock cycles are required to map the MROM instruction to a microcode ROM address.