1. Field of the Invention
This invention is related to the field of microprocessors and, more particularly, to prefetching and predecoding of instruction bytes within microprocessors.
2. Description of the Related Art
Superscalar microprocessors achieve high performance by executing multiple instructions per clock cycle and by choosing the shortest possible clock cycle consistent with the design. As used herein, the term xe2x80x9cclock cyclexe2x80x9d refers to an interval of time accorded to various stages of an instruction processing pipeline within the microprocessor. Storage devices (e.g. registers and arrays) capture their values according to the clock cycle. For example, a storage device may capture a value according to a rising or falling edge of a clock signal defining the clock cycle. The storage device then stores the value until the subsequent rising or falling edge of the clock signal, respectively. The term xe2x80x9cinstruction processing pipelinexe2x80x9d is used herein to refer to the logic circuits employed to process instructions in a pipelined fashion. Although the pipeline may be divided into any number of stages at which portions of instruction processing are performed, instruction processing generally comprises fetching the instruction, decoding the instruction, executing the instruction, and storing the execution results in the destination identified by the instruction.
In order to facilitate the location and dispatch of multiple instructions concurrently, many superscalar microprocessors employ predecoding. Predecoding refers to analyzing instruction bytes as they are fetched from main memory and stored into the instruction cache. The predecode data generated as a result of the predecoding is stored in the instruction cache as well. When the instruction bytes are fetched for dispatch into the instruction processing pipeline of the microprocessor, the corresponding predecode data is used by the instruction dispatch mechanism to identify the instructions and to aid in the routing of the instructions to particular functional units.
Generally speaking, predecode data comprises one or more bits of information generated by decoding the corresponding instruction bytes prior to storing the bytes into an instruction cache of a microprocessor. The predecode data varies depending upon many factors including: the nature of the instruction set defined by the microprocessor architecture employed by the microprocessor, the hardware execution units included in the microprocessor, etc. As a general rule, predecode data is generated to allow quick determination of a relevant characteristic of the instruction bytes being fetched where determining that characteristic from examining the bytes themselves may require a substantially longer period of time. The amount of time required to determine the characteristic may be so long that a certain clock cycle time or frequency for the microprocessor may be unachievable without adding pipeline stages to the instruction processing pipeline of the microprocessor.
For example, a microprocessor outfitted with various execution units which execute different subsets of the instruction set may need to quickly determine which subset a particular instruction belongs to in order to route the instruction to an appropriate execution unit. The predecode data for such an example may include an indication of the subset including the instruction, allowing the instruction dispatcher to identify an appropriate execution unit. In another example, a variable length instruction set (in which different instructions and/or different operand options for the same instruction occupy different numbers of bytes) may be employed by the microprocessor. The x86 instruction set is an exemplary variable length instruction set in which instructions may be between 1 and 15 bytes. In such a microprocessor, it is difficult to concurrently locate multiple instructions since the length of each instruction varies and is not determined until the instruction is at least partially decoded. Predecode data in this case may include indications of instruction boundaries (e.g. a byte at which an instruction begins or ends), such that a set of bytes forming an instruction may be quickly located and routed to an execution unit.
Unfortunately, for many of the same reasons that make predecoding desirable, the process of performing predecoding may be quite slow. During the predecoding, events may occur which cause the predecoding to be abandoned. For example, a branch misprediction may occur causing instruction fetch to begin at a new address. If predecoding of the instructions at the new address is required, the predecoder may abandon predecoding of the instructions. Alternatively, a branch instruction may be detected within the instructions being predecoded. If the branch prediction mechanism employed by the microprocessor predicts the branch taken, then the predecoder may predecode instructions at the predicted target address of the branch instruction instead of completing predecode of the current instruction cache line. In such a case, the incomplete set of predecode data may be stored into the instruction cache. If the instruction bytes which were not predecoded are later fetched, then the predecoder must predecode the instructions at that time. In many cases, the instructions subsequent to the predicted branch instruction will be executed. For example, if the predicted branch instruction is a subroutine call instruction, then the subsequent instructions will generally be executed upon return from the subroutine.
In either of the examples of predecode interruption given above, as well as others, predecode occurs at a later time when the instructions are needed in the instruction processing pipeline. Performance of the microprocessor may suffer as the instruction processing pipeline awaits the results of the predecoding. Furthermore, the time elapsing for fetching the bytes from external memory is quite large in many cases. Performance of the microprocessor may suffer as the instruction processing pipeline endures these delays as well.
The problems outlined above are in large part solved by a prefetch/predecode unit in accordance with the present invention. The prefetch/predecode unit includes one or more prefetch buffers which are configured to store prefetched sets of instruction bytes and corresponding predecode data. Additionally, each prefetch buffer is configured to store a predecode byte pointer. The predecode byte pointer indicates the byte within the corresponding prefetched set of instruction bytes at which predecoding is to be initiated. Advantageously, predecoding may be resumed within a given prefetch buffer if predecoding thereof is interrupted to predecode a different set of instruction bytes (e.g. a set of instruction bytes fetched from the instruction cache). Predecoding of the sets of instruction bytes within the prefetch buffer may thereby be reinitiated at a time when the predecoder would otherwise be idle. Performance of the microprocessor may be increased by the generation of additional predecode data. Should the instructions corresponding to the additional predecode data be fetched at a later time, the predecode data is available for execution.
Broadly speaking, the present invention contemplates a prefetch/predecode unit comprising a prefetch buffer and a control unit. The prefetch buffer is configured to store a first plurality of instruction bytes and corresponding predecode data. Furthermore, the prefetch buffer is configured to store a pointer indicating at which one of the first plurality of instruction bytes predecoding is to be initiated. Coupled to the prefetch buffer, the control unit is configured to predecode the first plurality of instruction bytes to generate the corresponding predecode data. Additionally, the control unit is configured to initiate predecoding at the one of the first plurality of instruction bytes indicated by the pointer.
The present invention further contemplates a microprocessor comprising an instruction cache and a prefetch/predecode unit. The instruction cache is configured to store a plurality of cache lines of instruction bytes and a corresponding plurality of predecode data. In addition, the instruction cache is configured to fetch one of the plurality of cache lines of instruction bytes and one of the corresponding plurality of predecode data in response to a fetch address. Still further, the instruction cache is configured to scan the one of the corresponding plurality of predecode data to determine if instructions being fetched are identified therein and to generate a predecode request if instructions being fetched are not identified therein. Coupled to the instruction cache, the prefetch/predecode unit is configured to store a prefetched cache line of instruction bytes. The prefetch/predecode unit is configured to predecode instructions within the prefetched cache line of instruction bytes. In addition, the prefetch/predecode unit is configured to interrupt predecoding of instructions within the prefetched cache line of instruction bytes upon receiving the predecode request.
Moreover, the present invention contemplates a method for predecoding a prefetched cache line. Predecode of the prefetched cache line is initiated. The predecoding of the prefetched cache line is interrupted to predecode a cache line stored in the instruction cache. Predecoding of the prefetched cache line is resumed upon completing the predecode of the cache line from the instruction cache.