1. Field of the Invention
The invention relates to the field of microprocessors and more particularly to microprocessor with an increased register space, three operand addressing, and predicated execution of instructions.
2. Description of the Relevant Art
The vast amount of software developed for prior .times.86 type microprocessor generations (i.e., the 8086/8, 80286, 80386, and 80486) places considerable pressure on manufacturers of microprocessors to maintain compatibility with previous generations. Compatibility is maintained by insuring that the new products execute the instruction set of the previous generations. Maintaining software compatibility, however, has forced many architectural compromises in newer products. In order to retain the functions of earlier products, hardware has often been simply modified or extended in order to increase capability and performance.
The .times.86 instruction set is relatively complex and is characterized by a plurality of variable byte length instructions. A generic format illustrative of the .times.86 instruction set is shown in FIG. 1. As illustrated in the figure, an .times.86 instruction consists of from one to five optional prefix bytes 102, followed by an operation code (opcode) field 104, an optional addressing mode (Mod R/M) byte 106, an optional scale-index-base (SIB) byte 108, an optional displacement field 110, and an optional immediate data field 112.
The opcode field 104 defines the basic operation for a particular instruction. The default operation of a particular opcode may be modified by one or more prefix bytes. For example, a prefix byte may be used to change the address or operand size for an instruction, to override the default segment used in memory addressing, or to instruct the processor to repeat a string operation a number of times. The opcode field 104 follows the prefix bytes 102, if any, and may be one or two bytes in length. The addressing mode (Mod R/M) byte 106 specifies the registers used as well as memory addressing modes. The scale-index-base (SIB) byte 108 is used only in 32-bit base-relative addressing using scale and index factors. A base field of the SIB byte specifies which register contains the base value for the address calculation, and an index field specifies which register contains the index value. A scale field specifies the power of two by which the index value will be multiplied before being added, along with any displacement, to the base value. The next instruction field is the optional displacement field 110, which may be from one to four bytes in length. The displacement field 110 contains a constant used in address calculations. The optional immediate field 112, which may also be from one to four bytes in length, contains a constant used as an instruction operand. The shortest .times.86 instructions are only one byte long, and comprise a single opcode byte. The 80286 sets a maximum length for an instruction at 10 bytes, while the 80386 and 80486 both allow instruction lengths of up to 15 bytes.
FIGS. 2 and 3 illustrate the internal fields associated with the Mod R/M byte and of the SIB byte, respectively. References to a particular register of the .times.86 architecture may appear within the REG/OP or the R/M field of the Mod R/M byte, or within the index field and base field of the SIB byte. (A register address may alternatively be implied by an opcode.) Thus, there are four possible references to a register in an .times.86 instruction. The REG/OP and R/M fields in the Mod R/M byte can specify the source and destination registers, and the base and index fields in the SIB byte can specify the base and index registers used in operand address calculations for memory accesses.
Significant deficiencies exist in the .times.86 architecture. A first deficiency of the .times.86 architecture is the small number of general purpose registers. Typical RISC processors have at least thirty-two general purpose registers, as opposed to eight for the .times.86. A larger register set allows more operands to be kept in the faster-access register file, rather than having to fetch them from memory. Modem compilers are also able to take advantage of a larger number of registers to expose greater instruction level parallelism for increased superscalar execution performance. In addition to the limited number of .times.86 registers, use of them by the compiler is complicated by the fact that most have special implicit uses in various instructions. Expanding the number of registers would alleviate these limitations.
Another limitation of the existing .times.86 instruction is the inability to predicate execution of instructions. Predicated execution refers to a situation in which an instruction is executed if and only if a predicate condition is met, wherein the condition to be evaluated is part of the instruction itself. Predicated execution of instructions can increase performance of highly pipelined microprocessor architectures by minimizing branch misprediction and its attendant performance penalties. In a pipelined microprocessor architecture, the microprocessor is preparing and executing multiple instructions in each clock cycle. As an example, a simplistic microprocessor pipeline might include four stages: fetch, decode, execute, and writeback. During any given clock cycle, the microprocessor is fetching a first instruction from an instruction cache, decoding a second instruction, executing a third instruction, and writing back the results of a previously executed fourth instruction to a register file or a cache memory. To keep the pipeline filled, the processor must determine which instructions are most likely to be executed following the instruction that is currently executing. This determination is less than precise because computer programs typically do not execute instructions in a linear or otherwise predictable manner. Instead, a typical computer program includes at least one decision step in which the result of the decision step determines which instruction will execute next. Prior to the actual execution of such a decision step, the microprocessor must attempt to predict which step will be executed after the decision step. When the processor mispredicts (i.e., when the instruction predicted by the processor to be executed after the decision step turns out not to be the correct instruction), a performance penalty is paid in a pipelined processor because the pipeline must be cleared resulting in the occurrence of one or more no-op cycles. A no-op cycle, for purposes of this disclosure refers to a processor clock cycle during which no instruction is executed by the processor. As an example using the four stage pipeline proposed earlier, it is possible that the condition represented by a decision step is not fully evaluated until the fourth or writeback stage. In such an embodiment, misprediction requires that the instructions in the previous three stages of the pipeline be cleared. The performance penalty for misprediction increases as the number of stages in the pipeline increases. Accordingly, it is desirable to minimize the occurrence of misprediction in a pipelined processor without placing any significant restrictions on the ability of systems and applications programmers to insert decision steps in their code.
Still another limitation of the .times.86 architecture and instruction set is the inability of incorporate three register operands in a single instruction. The .times.86 instruction set allows, at most, two register operands to be used in a given instruction. In an instruction referencing two register operands, one of the register operands must serve as both a source operand and a target or destination operand. In certain applications, it would be advantageous to permit an instruction in which, for example, the contents of first and second source operands were manipulated and stored in a third register.
Accordingly, it would be advantageous to implement a microprocessor capable of operating in a mode compatible with pre-existing software and that further permitted instruction set extensions that increased the effective number of addressable registers, permitted predicated execution of any instruction, and allowed a three register operand addressing mode.