1. Field of the Invention
This invention relates to microprocessors and more particularly, to determining the length of variable length microprocessor instructions.
2. Description of the Relevant Art
Superscalar microprocessors are capable of attaining performance characteristics which surpass those of conventional scalar processors by allowing the concurrent execution of multiple instructions. Due to the widespread acceptance of the x86 family of microprocessors, efforts have been undertaken by microprocessor manufacturers to develop superscalar microprocessors which execute x86 instructions. Such superscalar microprocessors achieve relatively high performance characteristics while advantageously maintaining backwards compatibility with the vast amount of existing software developed for previous microprocessor generations such as the 8086, 80286, 80386, and 80486.
The x86 instruction set is relatively complex and is characterized by a plurality of variable byte length instructions. A generic format illustrative of the x86 instruction set is shown in FIG. 1. As illustrated in the figure, an x86 instruction consists of from one to five optional prefix bytes 102, followed by an operation code (opcode) field 104, an optional addressing mode (Mod R/M) byte 106, an optional scale-index-base (SIB) byte 108, an optional displacement field 110, and an optional immediate data field 112.
The opcode field 104 defines the basic operation for a particular instruction. The default operation of a particular opcode may be modified by one or more prefix bytes.
For example, a prefix byte may be used to change the address or operand size for an instruction, to override the default segment used in memory addressing, or to instruct the processor to repeat a string operation a number of times. The opcode field 104 follows the prefix bytes 102, if any, and may be one or two bytes in length. The addressing mode (Mod R/M) byte 106 specifies the registers used as well as memory addressing modes. The scale-index-base (SIB) byte 108 is used only in 32-bit base-relative addressing using scale and index factors. A base field of the SIB byte specifies which register contains the base value for the address calculation, and an index field specifies which register contains the index value. A scale field specifies the power of two by which the index value will be multiplied before being added, along with any displacement, to the base value. The next instruction field is the optional displacement field 110, which may be from one to four bytes in length. The displacement field 110 contains a constant used in address calculations. The optional immediate field 112, which may also be from one to four bytes in length, contains a constant used as an instruction operand. The shortest x86 instructions are only one byte long, and comprise a single opcode byte. The 80286 sets a maximum length for an instruction at 10 bytes, while the 80386 and 80486 both allow instruction lengths of up to 15 bytes.
The complexity of the x86 instruction set poses many difficulties in implementing high performance x86 compatible superscalar microprocessors. One difficulty arises from the fact that instructions must be scanned and aligned before proper decode can be effectuated by the parallel-coupled instruction decoders used in such processors. In contrast to most RISC instruction formats, since the x86 instruction set consists of variable byte length instructions, the start bytes of successive instructions within a line are not necessarily equally spaced, and the number of instructions per line is not fixed. As a result, employment of simple, fixed-length shifting logic cannot by itself solve the problem of instruction alignment.
Instead of simple shifting logic, x86 compatible microprocessors typically use instruction scanning mechanisms to generate start and end bits for each instruction byte as they are stored in the instruction cache. These start and end bits are then used to generate a valid mask for each instruction. A valid mask is a series of bits in which each consecutive bit corresponds to a particular byte of instruction information. For a particular instruction fetch, the valid mask bits associated with the first byte of the instruction, the last byte of the instruction, and all bytes in between the first and last bytes of the instruction are asserted. All other bits in the valid mask are not asserted. For example, given the following 8-byte instruction cache line, the following valid mask would be generated for a fetch of instruction B:
______________________________________ byte .fwdarw. 0 1 2 3 4 5 6 7 ______________________________________ cache line A A B B B B C C ______________________________________ bit .fwdarw. 0 1 2 3 4 5 6 7 ______________________________________ end bit information 0 1 0 0 0 1 0 0 start bits 0 0 1 0 0 0 1 0 valid mask 0 0 1 1 1 1 0 0 ______________________________________
Once a valid mask is calculated for a particular instruction fetch, it may then be used to mask off the unwanted bytes that are not part of the particular instruction. In the example above, the valid mask for the fetch of instruction B could be used to mask off the unwanted end bytes of instruction A and the unwanted beginning bytes of instruction C. This masking is typically performed in an instruction alignment unit.
Unfortunately, the process of generating a valid mask and then masking off the undesired bytes is complicated and requires a large number of cascaded logic gates. In contrast, if the actual length of each instruction were known, then simple shifting logic could be used to align the instructions. While scanning logic has been proposed to dynamically find the boundaries of instructions during the decode stage of the pipeline, such solutions typically require the decode pipeline stage of the processor to be implemented with a relatively large number of cascaded levels of logic gates and/or the allocation of several clock cycles to perform the scanning operation. This correspondingly limits the maximum overall clock frequency of the superscalar microprocessor. For these reasons, a fast method for determining the length of variable length instructions and which does not add clock cycles to the decode stage is needed.