1. Field of the Invention
This invention is related to the field of processors and, more particularly, to predecoding techniques within processors.
2. Description of the Related Art
Superscalar processors attempt to achieve high performance by dispatching and executing multiple instructions per clock cycle, and by operating at the shortest possible clock cycle time consistent with the design. To the extent that a given processor is successful at dispatching and/or executing multiple instructions per clock cycle, high performance may be realized. In order to increase the average number of instructions dispatched per clock cycle, processor designers have been designing superscalar processors which employ wider issue rates. A xe2x80x9cwide issuexe2x80x9d superscalar processor is capable of dispatching (or issuing) a larger maximum number of instructions per clock cycle than a xe2x80x9cnarrow issuexe2x80x9d superscalar processor is capable of dispatching. During clock cycles in which a number of dispatchable instructions is greater than the narrow issue processor can handle, the wide issue processor may dispatch more instructions, thereby achieving a greater average number of instructions dispatched per clock cycle.
Many processors are designed to execute the x86 instruction set due to its widespread acceptance in the computer industry. For example, the K5 and K6 processors from Advanced Micro Devices, Inc., of Sunnyvale, Calif. implement the x86 instruction set. The x86 instruction set is a variable length instruction set in which various instructions occupy differing numbers of bytes in memory. The type of instruction, as well as the addressing modes selected for a particular instruction encoding, may affect the number of bytes occupied by that particular instruction encoding. Variable length instruction sets, such as the x86 instruction set, minimize the amount of memory needed to store a particular program by only occupying the number of bytes needed for each instruction. In contrast, many RISC architectures employ fixed length instruction sets in which each instruction occupies a fixed, predetermined number of bytes.
Unfortunately, variable length instruction sets complicate the design of wide issue processors. For a wide issue processor to be effective, the processor must be able to identify large numbers of instructions concurrently and rapidly within a code sequence in order to provide sufficient instructions to the instruction dispatch hardware. Because the location of each variable length instruction within a code sequence is dependent upon the preceding instructions, rapid identification of instructions is difficult. If a sufficient number of instructions cannot be identified, the wide issue structure may not result in significant performance gains. Therefore, a processor which provides rapid and concurrent identification of instructions for dispatch is needed.
Another feature which is important to the performance achievable by wide issue superscalar processors is the accuracy and effectiveness of its branch prediction mechanism. As used herein, the branch prediction mechanism refers to the hardware which detects control transfer instructions within the instructions being identified for dispatch and which predicts the next fetch address resulting from the execution of the identified control transfer instructions. Generally, a xe2x80x9ccontrol transferxe2x80x9d instruction is an instruction which, when executed, specifies the address from which the next instruction to be executed is fetched. Jump instructions are an example of control transfer instructions. A jump instruction specifies a target address different than the address of the byte immediately following the jump instruction (the xe2x80x9csequential addressxe2x80x9d). Unconditional jump instructions always cause the next instruction to be fetched to be the instruction at the target address, while conditional jump instructions cause the next instruction be fetched to be either the instruction at the target address or the instruction at the sequential address responsive to an execution result of a previous instruction (for example, by specifying a condition flag set via instruction execution). Other types of instructions besides jump instructions may also be control transfer instructions. For example, subroutine call and return instructions may cause stack manipulations in addition to specifying the next fetch address. Many of these additional types of control transfer instructions include a jump operation (either conditional or unconditional) as well as additional instruction operations.
Control transfer instructions may specify the target address in a variety of ways. xe2x80x9cRelativexe2x80x9d control transfer instructions include a value (either directly or indirectly) which is to be added to an address corresponding to the relative control transfer instruction in order to generate the target address. The address to which the value is added depends upon the instruction set definition. For x86 control transfer instructions, the address of the byte immediately following the control transfer instruction is the address to which the value is added. Other instruction sets may specifying adding the value to the address of the control transfer instruction itself. For relative control transfer instructions which directly specify the value to be added, an instruction field is included for storing the value and the value is referred to as a xe2x80x9cdisplacementxe2x80x9d.
On the other hand, xe2x80x9cabsolutexe2x80x9d control transfer instructions specify the target address itself (again, either directly or indirectly). Absolute control transfer instructions therefore do not require an address corresponding to the control transfer instruction to determine the target address. Control transfer instructions which specify the target address indirectly (e.g. via one or more register or memory operands) are referred to as xe2x80x9cindirectxe2x80x9d control transfer instructions.
Because of the variety of available control transfer instructions, the branch prediction mechanism may be quite complex. However, because control transfer instructions occur frequently in many program sequences, wide issue processors have a need for a highly effective (e.g. both accurate and rapid) branch prediction mechanism. If the branch prediction mechanism is not highly accurate, the wide issue processor may issue a large number of instructions per clock cycle but may ultimately cancel many of the issued instructions due to branch mispredictions. On the other hand, the number of clock cycles used by the branch prediction mechanism to generate a target address needs to be minimized to allow for the instructions that the target address to be fetched.
The term xe2x80x9cbranch instructionxe2x80x9d is used herein to be synonymous with xe2x80x9ccontrol transfer instructionxe2x80x9d.
The problems outlined above are in large part solved by a processor in accordance with the present invention. The processor is configured to predecode instruction bytes prior to their storage within an instruction cache. During the predecoding, relative branch instructions are detected. The displacement included within the relative branch instruction is added to the address corresponding to the relative branch instruction, thereby generating the target address. The processor replaces the displacement field of the relative branch instruction with an encoding of the target address, and stores the modified relative branch instruction in the instruction cache. Advantageously, the branch prediction mechanism employed by the processor may more rapidly generate the target address corresponding to relative branch instructions. The branch prediction mechanism may simply select the target address from the displacement field of the relative branch instruction instead of performing an addition to generate the target address. The rapidly generated target address may be provided to the instruction cache for fetching instructions more quickly than might otherwise be achieved. The amount of time elapsing between fetching a branch instruction and generating the corresponding target address may advantageously be reduced. Accordingly, the branch prediction mechanism may operate more efficiently, and hence processor performance may be increased through more rapid fetching of instructions stored at the target address. Superscalar processors may thereby support wider issue rates by fetching a larger number of instructions in a given period of time.
In one embodiment, relative branch instructions having eight bit and 32-bit displacement fields are included in the instruction set executed by the processor. Additionally, the processor employs predecode information (stored in the instruction cache with the corresponding instruction bytes) including a start bit and a control transfer bit corresponding to each instruction byte. The combination of the start bit indicating that the byte is the start of an instruction and the corresponding control transfer bit identifies the instruction as either a branch instruction or a non-branch instruction. For relative branch instructions including an eight bit displacement, the control transfer bit corresponding to the displacement field is used in conjunction with the displacement field to store the encoded target address. The encoded target address includes a cache line offset portion and a relative cache line portion identifying the target cache line as a function of the cache line storing the relative branch instruction. Thirty-two bit displacement fields store the entirety of the target address, and hence the encoded target address comprises the target address. Other embodiments than the one described above are contemplated.
Broadly speaking, the present invention contemplates a processor comprising a predecode unit and an instruction cache. The predecode unit is configured to predecode a plurality of instruction bytes received by the processor. Upon predecoding a relative control transfer instruction comprising a displacement, the predecode unit adds an address to the displacement to generate a target address corresponding to the relative control transfer instruction. Additionally, the predecode unit is configured to replace the displacement within the relative control transfer instruction with at least a portion of the target address. Coupled to the predecode unit, the instruction cache is configured to store the plurality of instruction bytes including the relative control transfer instruction with the displacement replaced by the portion of the target address.
The present invention further contemplates a method for generating a target address for a relative control transfer instruction. A plurality of instruction bytes including the relative transfer instruction are predecoded to detect the presence of the relative control transfer instruction. An address is added to a displacement included in the relative control transfer instruction, thereby generating the target address. The displacement is replaced within the relative control transfer instruction with an encoding indicative of the target address. The plurality of instruction bytes including the relative control transfer instruction is stored in an instruction cache, with the displacement replaced by the encoding.
Moreover, the present invention contemplates a predecode unit comprising a decoder and a target generator. The decoder is configured to decode a plurality of instruction bytes and to identify a relative control transfer instruction therein. The target generator is configured to add a displacement selected from the relative control transfer instruction to an address, thereby generating a target address corresponding to the relative control transfer instruction, and is further configured to generate an encoding of the target address with which the predecode unit replaces the displacement within the relative control transfer instruction.
The present invention still further contemplates a computer system comprising a processor, a memory, and an input/output (I/O) device. The processor is configured to predecode a plurality of instruction bytes received by the processor. Upon predecoding a relative control transfer instruction comprising a displacement, the processor is configured to add an address to the displacement to generate a target address corresponding to the relative control transfer instruction. Additionally, the processor is configured to replace the displacement within the relative control transfer instruction with at least a portion of the target address. Coupled to the processor, the memory is configured to store the plurality of instruction bytes and to provide the instruction bytes to the processor. The I/O device is configured to transfer data between the computer system and another computer system coupled to the I/O device.