1. Field of the Invention
This invention relates to microprocessors and, more particularly, to decoding variable-length instructions within a microprocessor.
2. Description of the Relevant Art
The number of software applications written for the x86 instruction set is quite large. As a result, despite the introduction of newer and more advanced instruction sets, microprocessor designers have continued to design microprocessors capable of executing the x86 instruction set.
The x86 instruction set is relatively complex and is characterized by a plurality of variable-length instructions. A generic format illustrative of the x86 instruction set is shown in FIG. 1. As illustrated in the figure, an x86 instruction consists of from one to five optional prefix bytes 102, followed by an operation code (opcode) field 104, an optional addressing mode (Mod R/M) byte 106, an optional scale-index-base (SIB) byte 108, an optional displacement field 110, and an optional immediate data field 112.
The opcode field 104 defines the basic operation for a particular instruction. The default operation of a particular opcode may be modified by one or more prefix bytes 102. For example, one of prefix bytes 102 may be used to change the address or operand size for an instruction, to override the default segment used in memory addressing, or to instruct the processor to repeat a string operation a number of times. The opcode field 104 follows prefix bytes 102, if present, and may be one or two bytes in length. The addressing mode (Mod RIM) byte 106 specifies the registers used as well as memory addressing modes. The scale-index-base (SIB) byte 108 is used only in 32-bit base-relative addressing using scale and index factors. A base field within SIB byte 108 specifies which register contains the base value for the address calculation, and an index field within SIB byte 108 specifies which register contains the index value. A scale field within SUB byte 108 specifies the power of two by which the index value will be multiplied before being added, along with any displacement, to the base value. The next instruction field is a displacement field 110, which is optional and may be from one to four bytes in length. Displacement field 110 contains a constant used in address calculations. The optional immediate field 112, which may also be from one to four bytes in length, contains a constant used as an instruction operand. The shortest x86 instructions are only one byte long, and comprise a single opcode byte. The 80286 sets a maximum length for an instruction at 10 bytes, while the 80386 and 80486 both allow instruction lengths of up to 15 bytes.
The complexity of the x86 instruction set poses many difficulties in implementing high performance x86-compatible microprocessors. In particular, the variable length of x86 instructions makes decoding instructions difficult. Decoding instructions typically involves determining the boundaries of an instruction and then identifying each field within the instruction, e.g., the opcode and operand fields. Decoding typically takes place once the instruction is fetched from the instruction cache before execution.
One method for determining the boundaries of instructions involves generating a number of predecode bits for each instruction byte read from main memory. The predecode bits provide information about the instruction byte they are associated with. For example, an asserted predecode start bit indicates that the associated instruction byte is the first byte of an instruction. Similarly, an asserted predecode end bit indicates that the associated instruction byte is the last byte of an instruction. Once the predecode bits for a particular instruction byte are calculated, they are stored together with the instruction byte in an instruction cache. When a xe2x80x9cfetchxe2x80x9d is performed, i.e., a number of instruction bytes are read from the instruction cache, the associated start and end bits are also read. The start and end bits may then be used to generate valid masks for the individual instructions with the fetch. A valid mask is a series of bits in which each bit corresponds to a particular instruction byte. Valid mask bits associated with the first byte of an instruction, the last byte of the instruction, and all bytes in between the first and last bytes of the instruction are asserted. All other valid mask bits are not asserted. Turning now to FIG. 2, an exemplary valid mask is shown. The figure illustrates a portion of a fetch 120 and its associated start and end bits 122 and 124. Assuming a valid mask 126 for instruction B 128 is to be generated, start and end bits 122 and 124 would be used to generate the mask. Valid mask 126 could then be used to mask off all bytes within fetch 120 that are not part of instruction B 128.
Once the boundaries of an instruction have been determined, the fields within the instruction, e.g., the opcode and operand fields, may be identified. Once again, the variable length of x86 instructions complicates the identification process. In addition, the optional prefix bytes within an x86 instruction create further complications. For example, in some instructions the opcode will begin with the first byte of the instruction, while others may begin with the second, third, or fourth byte.
To perform the difficult task of decoding x86 instructions, a number of cascaded levels of logic are typically used. Thus, decoding may require a number of clock cycles and may create a significant delay before any instructions are available to the functional stages of the microprocessor""s pipeline. As microprocessors increase the number of instructions they are able to execute per clock cycle, instruction decoding may become a performance limiting factor. Therefore, a mechanism for simplifying the complexity and time required for instruction decoding is needed.
The problems outlined above are in large part solved by a microprocessor capable of storing both variable- and fixed-length instructions in a cache. In one embodiment, the microprocessor comprises an instruction cache and a predecoder. The instruction cache may have a fixed-length instruction storage array and a variable-length instruction storage array. The predecoder is coupled to the instruction cache and is configured to predecode variable-length instructions into fixed-length instructions by padding the variable-length instructions with constants. Advantageously, storing fixed-length versions of instructions may reduce decode and alignment times. However, fetching instructions after branch instructions that are xe2x80x9ctakenxe2x80x9d may be difficult because each stored fixed-length instruction may be shifted as a result of the padding constants. Storing variable-length instructions in addition to the fixed-length instructions may address this problem by allowing the instruction cache to output variable-length instructions when it would be difficult or impossible to determine which fixed-length instruction to output.
In another embodiment, the instruction cache may be configured to output variable-length instructions in response to receiving requested address generated by xe2x80x9ctakenxe2x80x9d branch instruction. Conversely, non-branch instructions and branch instructions that are xe2x80x9cnot takenxe2x80x9d simply fetch the next instruction in the program sequence. Thus, address offsets caused by padding are less likely to be a problem, and fixed-length instructions may be output by the instruction cache. In some embodiments, a multiplexer may be used to select between the fixed-length or variable-length instructions.
In yet another embodiment, the predecoder may be configured to predecode the variable-length instructions into one of a predetermined number of groups of fixed-length instructions. The fixed-length instructions within each group may have the same length and or the same instruction fields.
A method for predecoding variable-length instructions is also contemplated. In one embodiment the method comprises receiving a variable-length instruction and storing it in a variable-length instruction storage array. The variable-length instruction is also predecoded by creating fixed-length instruction from the variable-length instruction. The fixed-length instruction is created by padding the variable-length instruction with copies of a constant until the variable-length instruction reaches a pre-determined length. Once the fixed-length instruction is formed, it is stored in a fixed-length instruction storage array. The constants may be added at the end of the variable-length instruction, or to each instruction field within the variable-length instruction. The fixed-length instructions generated may all have the same length, or they may each be padded to one of several predetermined lengths.
A microprocessor configured to execute variable-length instructions is also contemplated. In one embodiment, the microprocessor may comprise a predecoder, an instruction cache, and a plurality of decode units. The predecoder and instruction cache may be configured as previously disclosed. The decode units may be coupled to receive fetched instructions from the instruction cache. One or more decode units may be optimized to receive fixed-length instructions, with the remaining decode units may be optimized for variable-length instructions. Decoding variable-length instructions may take extra clock cycles when compared with decoding fixed-length instructions.
In yet another embodiment, the instruction cache may be configured with a plurality of sub-arrays, wherein each sub-array is configured to store fixed-length instructions have a particular length and or particular fields.