1. Field of the Invention
This invention relates to the field of superscalar microprocessors and, more particularly, to aligning variable byte-length instructions to issue positions within superscalar microprocessors.
2. Description of the Relevant Art
Superscalar microprocessors achieve high performance by executing multiple instructions per clock cycle and by choosing the shortest possible clock cycle consistent with the design. As used herein, the term "clock cycle" refers to an interval of time accorded to various stages of an instruction processing pipeline within the microprocessor. Storage devices (e.g. registers and arrays) capture their values according to the clock cycle. For example, a storage device may capture a value according to a rising or falling edge of a clock signal defining the clock cycle. The storage device then stores the value until the subsequent rising or falling edge of the clock signal, respectively. The term "instruction processing pipeline" is used herein to refer to the logic circuits employed to process instructions in a pipelined fashion. Although the pipeline may be divided into any number of stages at which portions of instruction processing are performed, instruction processing generally comprises fetching the instruction, decoding the instruction, executing the instruction, and storing the execution results in the destination identified by the instruction.
A popular microprocessor architecture is the x86 microprocessor architecture. Due to the widespread acceptance of the x86 microprocessor architecture in the computer industry, superscalar microprocessors designed in accordance with this architecture are becoming increasingly common. The x86 microprocessor architecture specifies a variable byte-length instruction set in which different instructions may occupy differing numbers of bytes. For example, the 80386 and 80486 processors allow a particular instruction to occupy a number of bytes between 1 and 15. The number of bytes occupied depends upon the particular instruction as well as various addressing mode options for the instruction.
Because instructions are variable-length, locating instruction boundaries is complicated. The length of a first instruction must be determined prior to locating a second instruction subsequent to the first instruction within an instruction stream. However, the ability to locate multiple instructions within an instruction stream during a particular clock cycle is crucial to superscalar microprocessor operation. As operating frequencies increase (i.e. as clock cycles shorten), it becomes increasingly difficult to locate multiple instructions simultaneously.
Various predecode schemes have been proposed in which a predecoder appends information regarding each instruction byte to the instruction byte as the instruction is stored into the cache. As used herein, the term "predecoding" is used to refer to generating instruction decode information prior to storing the corresponding instruction bytes into an instruction cache of a microprocessor. The generated information may be stored with the instruction bytes in the instruction cache. For example, an instruction byte may be indicated to be the beginning or end of an instruction. By scanning the predecode information when the corresponding instruction bytes are fetched, instructions may be located without actually attempting to decode the instruction bytes. The predecode information may be used to decrease the amount of logic needed to locate multiple variable-length instructions simultaneously. Unfortunately, these schemes become insufficient at high clock frequencies as well. A method for locating multiple instructions during a clock cycle at high frequencies is needed.