1. Field of the Invention
This invention relates to the field of superscalar microprocessors and, more particularly, to classifying instructions prior to decode of the instructions in order to more optimally utilize microprocessor resources.
2. Description of the Relevant Art
Superscalar microprocessors achieve high performance by simultaneously executing multiple instructions in a clock cycle and by specifying the shortest possible clock cycle consistent with the design. As used herein, the term "clock cycle" refers to an interval of time during which the pipeline stages of a microprocessor perform their intended functions. Memory elements (such as registers and arrays within the microprocessor) capture data values according to a clock signal which defines the clock cycle. For example, memory elements may capture their data values based upon a rising or falling edge of the clock signal.
Superscalar microprocessor manufacturers often design microprocessors according to the .times.86 microprocessor architecture. Due to the widespread acceptance in the computer industry of the .times.86 microprocessor architecture, superscalar microprocessors designed to execute .times.86 instructions may be suitable for use in many computer system configurations. The .times.86 instruction set is an example of a complex instruction set computer (CISC) instruction set. Certain CISC instructions are defined to perform complex operations which may require multiple clock cycles to complete. For example, a CISC instruction may utilize a memory operand (i.e. an operand value stored in a memory location as opposed to a register). Fetching the operand from memory may require several clock cycles prior to execution of the instruction upon the operand value. Additionally, a CISC instruction may specify several results to be stored in several different storage locations. Since execution units within a superscalar microprocessor are capable of conveying a finite number of results during a clock cycle, these several results add complexity. The number of results an instruction specifies may affect the number of clock cycles required to execute the instruction.
The .times.86 instruction set additionally defines instructions which are variable length. A variable length instruction set is an instruction set in which the various instructions comprise differing numbers of bytes. One instruction within the instruction set may be specified by a single byte, while other instructions may be specified by more than one byte. An .times.86 instruction, for example, may include zero to five prefix bytes, one to two opcode bytes, an optional addressing mode byte, an optional scale-index-base byte (SIB byte), zero to four bytes of displacement, and zero to four bytes of immediate data. Prefix bytes allow modification of the instruction defined by the opcode bytes. The optional addressing mode byte may specify several addressing modes for the instruction. The SIB byte may define further modifications to the addressing mode. Displacement and immediate data may be encoded within the instruction for use with certain addressing modes. Thus, an instruction may be defined by as little as one opcode byte or numerous bytes may define the instruction. More information regarding the .times.86 microprocessor architecture may be found within the publication entitled: "PC Programmer's Technical Reference: The Processor and Coprocessor" by Hummel, Ziff-Davis Press, Emeryville, Calif., 1992. This publication is incorporated herein by reference in its entirety. It is noted that the .times.86 microprocessor architecture is only one example of a variable instruction set. Other variable length instruction sets may be defined.
Variable length CISC instruction sets present a large problem for superscalar microprocessors. Because the instructions are variable length, it is difficult to determine instruction boundaries quickly. Superscalar microprocessors attempt to execute multiple instructions per clock cycle, so quickly determining instruction boundaries is important to the overall performance of the microprocessor. Additionally, superscalar microprocessors may generate several simpler instructions from certain more complex CISC instructions in order to simplify the microprocessor's execution units. The simpler instructions, when considered together, perform an equivalent function to the more complex instruction. Such instructions therefore utilize several execution units. Allocating more than one execution unit to an instruction after instructions have been decoded is a complex process which may be difficult to implement correctly. Other information regarding instructions being fetched and transferred to a decode unit or units within a microprocessor may be useful to determine early in the instruction processing pipeline.
Generally speaking, a superscalar microprocessor may implement an instruction processing pipeline. By employing a pipeline, the processing of instructions may be overlapped such that the number of instructions executed during a period of time is larger than the number of instructions that could be individually processed during that time period. For example, an instruction may be executed while a second instruction is decoded and a third instruction is fetched from memory. Superscalar microprocessors may operate upon multiple instructions in each stage of the instruction processing pipeline or, alternatively, may employ multiple instruction processing pipelines.
Instruction processing pipelines may include stages which fetch an instruction, decode the instruction and fetch the operands, execute the instruction, and write the result to the destination. Each stage may be performed in multiple clock cycles (and thus each stage may comprise multiple stages). Therefore the instruction processing pipeline may comprise a fetch stage or stages, a decode stage or stages, an execute stage or stages, and a writeback stage or stages. Instructions pass from the fetch stage through the decode and execute stage to the writeback stage. As an example of multiple stages within these generalized stages, the execute stage may include a memory operand fetch stage for fetching operand values stored in a memory location and a stage in which the instruction is executed. Instruction not including a memory operand may bypass the memory operand fetch stage. It would be desirable to have a microprocessor which may detect important instruction information earlier in the instruction processing pipeline than the decode stage or stages. It would be particularly desirable to have a microprocessor configured to detect complex instructions early in the instruction processing pipeline.