1. Field of the Invention
This invention is related to the field of microprocessors and, more particularly, to prefetching and predecoding of instruction bytes within microprocessors.
2. Description of the Related Art
Superscalar microprocessors achieve high performance by executing multiple instructions per clock cycle and by choosing the shortest possible clock cycle consistent with the design. As used herein, the term "clock cycle" refers to an interval of time accorded to various stages of an instruction processing pipeline within the microprocessor. Storage devices (e.g. registers and arrays) capture their values according to the clock cycle. For example, a storage device may capture a value according to a rising or falling edge of a clock signal defining the clock cycle. The storage device then stores the value until the subsequent rising or falling edge of the clock signal, respectively. The term "instruction processing pipeline" is used herein to refer to the logic circuits employed to process instructions in a pipelined fashion. Although the pipeline may be divided into any number of stages at which portions of instruction processing are performed, instruction processing generally comprises fetching the instruction, decoding the instruction, executing the instruction, and storing the execution results in the destination identified by the instruction.
In order to facilitate the location and dispatch of multiple instructions concurrently, many superscalar microprocessors employ predecoding. Predecoding refers to analyzing instruction bytes as they are fetched from main memory and stored into the instruction cache. The predecode data generated as a result of the predecoding is stored in the instruction cache as well. When the instruction bytes are fetched for dispatch into the instruction processing pipeline of the microprocessor, the corresponding predecode data is used by the instruction dispatch mechanism to identify the instructions and to aid in the routing of the instructions to particular functional units.
Generally speaking, predecode data comprises one or more bits of information generated by decoding the corresponding instruction bytes prior to storing the bytes into an instruction cache of a microprocessor. The predecode data varies depending upon many factors including: the nature of the instruction set defined by the microprocessor architecture employed by the microprocessor, the hardware execution units included in the microprocessor, etc. As a general rule, predecode data is generated to allow quick determination of a relevant characteristic of the instruction bytes being fetched where determining that characteristic from examining the bytes themselves may require a substantially longer period of time. The amount of time required to determine the characteristic may be so long that a certain clock cycle time or frequency for the microprocessor may be unachievable without adding pipeline stages to the instruction processing pipeline of the microprocessor.
For example, a microprocessor outfitted with various execution units which execute different subsets of the instruction set may need to quickly determine which subset a particular instruction belongs to in order to route the instruction to an appropriate execution unit. The predecode data for such an example may include an indication of the subset including the instruction, allowing the instruction dispatcher to identify an appropriate execution unit. In another example, a variable length instruction set (in which different instructions and/or different operand options for the same instruction occupy different numbers of bytes) may be employed by the microprocessor. The x86 instruction set is an exemplary variable length instruction set in which instructions may be between 1 and 15 bytes. In such a microprocessor, it is difficult to concurrently locate multiple instructions since the length of each instruction varies and is not determined until the instruction is at least partially decoded. Predecode data in this case may include indications of instruction boundaries (e.g. a byte at which an instruction begins or ends), such that a set of bytes forming an instruction may be quickly located and routed to an execution unit.
Unfortunately, for many of the same reasons that make predecoding desirable, the process of performing predecoding may be quite slow. During the predecoding, events may occur which cause the predecoding to be abandoned. For example, a branch misprediction may occur causing instruction fetch to begin at a new address. If predecoding of the instructions at the new address is required, the predecoder may abandon predecoding of the instructions. Alternatively, a branch instruction may be detected within the instructions being predecoded. If the branch prediction mechanism employed by the microprocessor predicts the branch taken, then the predecoder may predecode instructions at the predicted target address of the branch instruction instead of completing predecode of the current instruction cache line. In such a case, the incomplete set of predecode data may be stored into the instruction cache. If the instruction bytes which were not predecoded are later fetched, then the predecoder must predecode the instructions at that time. In many cases, the instructions subsequent to the predicted branch instruction will be executed. For example, if the predicted branch instruction is a subroutine call instruction, then the subsequent instructions will generally be executed upon return from the subroutine.
In either of the examples of predecode interruption given above, as well as others, predecode occurs at a later time when the instructions are needed in the instruction processing pipeline. Performance of the microprocessor may suffer as the instruction processing pipeline awaits the results of the predecoding. Furthermore, the time elapsing for fetching the bytes from external memory is quite large in many cases. Performance of the microprocessor may suffer as the instruction processing pipeline endures these delays as well.