1. Field of the Invention
This invention relates to the field of microprocessors and, more particularly, to the detection and dispatch of microcode instructions within microprocessors.
2. Description of the Related Art
Superscalar microprocessors achieve high performance by executing multiple instructions per clock cycle and by choosing the shortest possible clock cycle consistent with the design. As used herein, the term "clock cycle" refers to an interval of time accorded to various stages of an instruction processing pipeline within the microprocessor. Storage devices (e.g. registers and arrays) capture their values according to the clock cycle. For example, a storage device may capture a value according to a rising or falling edge of a clock signal defining the clock cycle. The storage device then stores the value until the subsequent rising or falling edge of the clock signal, respectively. The term "instruction processing pipeline" is used herein to refer to the logic circuits employed to process instructions in a pipelined fashion. Generally speaking, a pipeline comprises a number of stages at which portions of a particular task are performed. Different stages may simultaneously operate upon different items, thereby increasing overall throughput. Although the instruction processing pipeline may be divided into any number of stages at which portions of instruction processing are performed, instruction processing generally comprises fetching the instruction, decoding the instruction, executing the instruction, and storing the execution results in the destination identified by the instruction.
Microprocessor designers often design their products in accordance with the x86 microprocessor architecture in order to take advantage of its widespread acceptance in the computer industry. Because the x86 microprocessor architecture is pervasive, many computer programs are written in accordance with the architecture. X86 compatible microprocessors may execute these computer programs, thereby becoming more attractive to computer system designers who desire x86-capable computer systems. Such computer systems are often well received within the industry due to the wide range of available computer programs.
The x86 microprocessor specifies a variable length instruction set (i.e. an instruction set in which various instructions employ differing numbers of bytes to specify that instruction). For example, the 80386 and later versions of x86 microprocessors employ between 1 and 15 bytes to specify a particular instruction. Instructions have an opcode, which may be 1-2 bytes, and additional bytes may be added to specify addressing modes, operands, and additional details regarding the instruction to be executed.
Unfortunately, having variable byte length instructions creates numerous problems for dispatching multiple instructions per clock cycle. Because the instructions have differing numbers of bytes, an instruction may begin at any memory address. Conversely, fixed length instructions typically begin at a known location. For example, a 4 byte fixed length instruction set has instructions which begin at 4 byte boundaries within memory (i.e. the two least significant bits are zeros for the memory addresses at which instructions begin).
In order to locate multiple variable byte length instructions during a clock cycle, instruction bytes fetched by the microprocessor may be serially scanned to determine instruction boundaries and thereby locate instructions which may be concurrently dispatched. Serial scanning involves a large amount of logic, and typically a large number of cascaded logic levels. For high frequency (i.e. short clock cycle time) microprocessors, large numbers of cascaded logic levels may be deleterious to the performance of the microprocessor. Some microprocessor designs employ predecoding to identify the beginning and end of instructions as the instructions are stored into an instruction cache within the microprocessor. Even with predecoding, locating and dispatching multiple instructions per clock cycle is a complex and often clock-cycle-limiting operation. Multiple levels of multiplexing are employed, and significant bussing between the multiplexors and the instruction bytes being dispatched is needed to allow arbitrary selection of bytes from the instruction bytes being examined for dispatch. The first instruction to be dispatched may be located anywhere within the instruction bytes. The second instruction to be dispatched is then located at the end of the first instruction, etc.
Additionally, some microprocessor designs employ two types of instructions. A first type, called fast-path instructions, may be directly decoded. Another type called microcode or microcode instructions, are parsed into a plurality of simple instructions prior to execution. The microprocessor in addition to locating and dispatching instructions within a clock cycle must also determine whether an instruction is a fast-path or microcode instruction and convey the instruction to the appropriate portions of the microprocessor for decoding and execution. Still further, the opcode of a microcode instruction must be detected prior to decoding. The process of locating an instruction, identifying the type of instruction, dispatching the instruction and locating the opcode of the dispatched instruction is a complex operation that can create a bottleneck in processor throughput.