1. Technical Field
The present invention relates to a method and apparatus for data processing in general, and in particular to a method and apparatus for loading an instruction buffer. Still more particularly, the present invention relates to a method and apparatus for loading an instruction buffer of a superscalar processor capable of out-of-order instruction issue.
2. Description of the Prior Art
Most, if not all, superscalar processors are capable of performing out-of-order instruction issue. Although there are many implementation schemes for out-of-order instruction issue, the key element for all these schemes is an issue queue (or issue logic) that determines the actual order of execution based on the resolution of data dependencies and the availability of execution resources, instead of the order in which instructions appear within the program.
Nevertheless, instructions are typically stored according to program order in a cache line within an instruction cache (I-cache) of a processor. Furthermore, each unit of access to the I-cache is generally more than one instruction. For example, for a processor architecture that has a four-byte instruction length, each I-cache access may be 32 bytes wide, which equals to a total of eight instructions per I-cache access. Even with the simplest I-cache design, these instructions must be multiplexed into an instruction buffer having eight or less slots, before sending to the issue queue.
Following along the above example, eight instructions are initially read from the I-cache. The fetch address of the first instruction is then utilized to control an 8-to-1 multiplexor to gate the first four instructions into an instruction buffer with, for example, four slots. The fetch address is also utilized to select a target instruction along with the next three instructions from the eight instructions, to gate into the instruction buffer. All four instructions are gated into the instruction buffer in execution order instead of program order. With this arrangement, when the fetch address is the result of a (predicted or actual) branch instruction, the first instruction to be gated into the instruction buffer may be any one of the eight instructions. Thus, if the target address of the branch instruction points to the last instruction, the next to last instruction, or even the second to last instruction of the I-cache access, then not all four slots within the instruction buffer will be completely filled, resulting in a loss of dispatch bandwidth. Consequently, it would be desirable to provide an improved method and apparatus for loading an instruction buffer without sacrificing dispatch bandwidth or cache efficiency.