The present invention relates to the field of instruction execution in computers, and more particularly to re-encoding illegal op codes into a single illegal op code thereby freeing up the vacated illegal op codes to be used to accommodate the extra bits associated with other pre-decoded defined instructions.
Typically, instructions within an instruction set of a microprocessor may be encoded into specific, unique combinations of bits. These encoded instructions may be stored in memory and fetched into an instruction cache when needed by the executing program. As these instructions are read out of the instruction cache, the encoded bits are decoded into a larger number of bits (xe2x80x9ccontrol fieldsxe2x80x9d), which may then be used to control the precise operation of the given instruction as it travels down the execution pipeline of the processor.
For example, the PowerPC(trademark) processor architecture may encode all instructions into unique 32-bit values. Of these 32 bits, the first six-bits may be considered to be the xe2x80x9cprimary op codexe2x80x9d field. Certain instruction encodings may be expanded into various xe2x80x9csecondary op codexe2x80x9d encodings, which utilize other bits of the 32-bit instruction encoding. In the PowerPC(trademark) processor architecture, there may be over 200 instruction encodings where these may be encoded into various combinations of the 64 possible primary op codes. Some of the instruction encodings encoded into the various combinations of the 64 possible primary op codes may be expanded into many more secondary op codes.
When designing high frequency microprocessors, one of the difficult logic paths may be the logic path from the instruction cache to the execution pipeline. This logic path may involve the decoding of the instruction op code from the instruction cache. In order to alleviate the timing problems associated with this difficult logic path, one technique that has been used may commonly be referred to as xe2x80x9cinstruction pre-decode.xe2x80x9d With this technique, the instruction op codes may be typically decoded (or partially decoded) as they are fetched from memory. The instructions may then be stored in the instruction cache with the op codes being decoded or partially decoded. This may be beneficial because there may be less function involved in the logic paths between the memory and the instruction cache than there is between the instruction cache and the execution pipeline. For example, the logic path between memory and the instruction cache may simply involve steering the encoded instruction to the instruction cache input buffer, whereas the logic path between the instruction cache and the execution pipeline may involve decoding the instruction, determining that an instruction is a branch instruction, calculating the target address of the branch instruction, and re-directing the instruction fetching mechanism to a different instruction address. Thus, there may be more cycle time available for the decoding function to be performed in the former path as opposed to the latter. Consequently, when the pre-decoded instructions are later read out of the instruction cache, they may be passed to the subsequent pipeline stages without having to first perform the decoding function.
Oftentimes, certain instruction types may have a severe timing constraint on the instruction decode. For example, recognition and decode of a branch instruction may be particularly important since a branch instruction may redirect the instruction execution from one address to another. By recognizing and pre-decoding branch instructions, and storing this pre-decode information in the instruction cache, the latency associated with the subsequent fetch and execution of such branches may be minimized. Accordingly, a pre-decoding mechanism may for example create an explicit bit in the decoded version of the instruction to directly indicate the predicted direction of the branch, i.e., whether the branch is predicted to be taken or not.
The problem with this technique of instruction pre-decode is that it may increase the number of bits required to represent each instruction in the instruction cache and thus increase the physical size of the cache required to hold any given number of instructions. This increased size may also lead to an increase in the power consumed by the instruction cache, as well as an increase in the latency associated with accessing the cache.
It would therefore be desirable to develop a technique of utilizing bits in an illegal op code in order to not increase the number of bits required to represent each instruction in the instruction cache and thus prevent the increase in the physical size of the cache required to hold any given number of instructions.
The problems outlined above may at least in part be solved in some embodiments by encoding illegal op codes in instructions into a single illegal op code. Extra bits associated with pre-decoded defined instructions may then be stored in the vacated illegal op codes. For example, as described in U.S. application Ser. No. 10/082,144 filed on Feb. 25, 2002, entitled xe2x80x9cEfficiently Calculating a Branch Target Address,xe2x80x9d Attorney Docket No. RPS920010176US1, branch instructions may be pre-decoded to convert an n-bit xe2x80x9cdisplacementxe2x80x9d field into a combination of an n-bit xe2x80x9ctargetxe2x80x9d field and a xe2x80x9ccarry-outxe2x80x9d field, requiring one extra bit in the instruction re-encoding. This extra bit of information may be encoded into the vacated op code space associated with the illegal instructions which have been re-encoded to use a single, different illegal op code, without requiring that the instruction cache contains an additional storage bit for the pre-decoded instruction.
In one embodiment of the present invention, a method for utilizing bits in an illegal op code in order to not increase the number of bits required to represent each pre-decoded instruction may comprise the step of re-encoding by a re-encoding logic unit a plurality of illegal op codes to use a single illegal op code, as described in greater detail below. An instruction may be fetched from a memory by an instruction cache coupled to the memory. Extra bits associated with pre-decoded defined instructions may then be encoded into the vacated illegal op codes as illustrated below.
A fetch unit coupled to the instruction cache may search for a copy of the address of the next instruction to be executed in the instruction cache. In the case of a cache miss, the instruction may be fetched from memory by the fetch unit.
A determination may then be made by the re-encoding logic unit coupled to the instruction cache as to whether or not the fetched instruction has an op code which is a member of a collection of illegal op codes. If the instruction op code is a member of this collection, the instruction may then be re-encoded to use a different, common illegal op code that is not a member of the collection. In one embodiment, there may be a collection of two illegal op codes which occupy the instruction encodings, e.g., binary values of 111000 and 111100. The re-encoding logic unit of such an embodiment may then re-encode all instances of these two instruction op codes into a different common illegal op code, e.g., binary value of 000001. Consequently, the two formerly illegal op codes become available for re-use by a pre-decoding logic unit in order to encode additional information associated with a pre-decoded instruction.
Accordingly, a determination may also be made by the pre-decoding logic unit as to whether or not the fetched instruction has an op code which is to be pre-decoded and stored in the instruction cache in its pre-decoded form. The op code may be pre-decoded and stored in the instruction cache in its pre-decoded form in order to provide additional information to a decode/selecting logic unit coupled to the instruction cache. In one embodiment, the pre-decoding logic unit may detect a relative branch instruction, which comprises an op code, e.g., binary value of 100000, a sign-bit, and a 25-bit displacement field. The pre-decoding logic unit may pre-decode this relative branch instruction by replacing the 25-bit displacement field with a 25-bit partial sum field and a 1-bit carry-out field. The 25-bit partial sum field may be formed by adding the 25-bit displacement field to the low-order 25 bits of the address of the branch instruction itself. The 1-bit carry-out field may be the carry-out of this 25-bit addition. In order to avoid the need for an extra storage bit in the instruction cache for this pre-decoded carry-out field, the pre-decoding logic unit may convert the op code field for the relative branch instruction, e.g., convert the op code field from binary value of 100000 to binary value of 111C00, where xe2x80x9cCxe2x80x9d is the carry-out field, thereby effectively utilizing the vacated op code space of the two re-encoded illegal op codes in order to encode the additional pre-decoded instruction information.
In this fashion, the combination of re-encoding a collection of illegal op codes into a single, common illegal op code, together with the use of this vacated op code space to contain additional information associated with pre-decoded instructions, provides the benefits of instruction pre-decoding outlined in U.S. application Ser. No. 10/082,144 filed on Feb. 25, 2002, entitled xe2x80x9cEfficiently Calculating a Branch Target Address,xe2x80x9d without the costs associated with additional storage bits in the instruction cache to contain this additional pre-decoded information.
The foregoing has outlined rather broadly the features and technical advantages of one or more embodiments of the present invention in order that the detailed description of the invention that follows may be better understood. Additional features and advantages of the invention will be described hereinafter which form the subject of the claims of the invention.