1. Field of the Invention
The present invention relates to an instruction fetch and rotate mechanism in a microprocessor that executes variable-length instructions.
2. Description of Related Art
Computers process information by executing a sequence of instructions, which may be supplied from a computer program written in a particular format and sequence designed to direct the computer to operate a particular sequence of operations. Most computer programs are written in high level languages such as FORTRAN or "C" which are not directly executable by the computer processor. These high level instructions are translated into instructions, termed "macroinstructions" herein, having a format that can be decoded and executed within the processor.
Macroinstructions are conventionally stored in data blocks having a predefined length in a computer memory element, such as main memory or an instruction cache. Macroinstructions are fetched from the memory elements and then supplied to a decoder, in which each macroinstruction is decoded into one or more microinstructions having a form that is executable by an execution unit in the processor.
Pipelined processors define multiple stages for processing a macroinstruction. These stages are defined so that a typical instruction can complete processing in one cycle and then move on to the next stage in the next cycle. In order to obtain maximum efficiency from a pipelined processing path, the decoder and subsequent execution units must process multiple instructions every cycle. Accordingly, it is advantageous for the fetching circuits to supply multiple new macroinstructions every cycle. In order to supply multiple instructions per clock, a block of instruction code at the most likely subsequent execution location is fetched and buffered so that it can be supplied to an instruction decoder when requested.
If sufficient time and buffer space is available, additional instruction blocks can be buffered, dependent upon the ability to predict the subsequent execution locations. In the event of a mispredicted branch or other change of instruction flow, the instruction code in the buffers will no longer be useful. Additional clock cycles are then required to redirect fetching operations and retrieve the instruction code from memory. Accordingly, it would be advantageous to have a mechanism that can quickly supply the newly fetched instruction code from memory, thereby reducing (or eliminating) the additional clock cycles needed to switch between instruction streams.
Operations to fetch, buffer, and rotate multiple macroinstructions every cycle can be complicated by the format of the macroinstructions, particularly if those macroinstructions have a variable length. One example of a popular instruction set that allows variable length instructions is the INTEL instruction set, in which instruction lengths can vary from one to fifteen bytes. With variable length instructions, the location of instruction boundaries (i.e., the location between adjoining macroinstructions in the instruction code) in a block of instruction code is difficult to determine.
It would be advantageous to provide a fetch and rotate mechanism that can supply a block of instruction code aligned with an instruction boundary in every cycle. Such a mechanism would be useful to supply multiple variable length instructions to an instruction buffer, from which they can be steered to a multiple instruction decoder. Particularly, such a fetch and rotate mechanism would be useful for a multiple instruction decoder that can issue multiple micro-operations to a high performance execution unit that executes more than one micro-operation per cycle.