1. Technical Field
The present invention generally relates to formation of groups of processor instructions. More specifically, the present invention relates to formation of processor instruction groups that can include multiple branch processor instructions per group of processor instructions.
2. Description of the Related Art
Throughput of a superscalar processor is affected by a number of processor instructions (e.g., a group of processor instructions) that are accepted in a clock cycle. FIG. 1 illustrates a prior art instruction decode unit that receives multiple processor instructions and forms multiple groups of processor instructions. As shown, a prior art instruction decode unit 100 includes an instruction buffer 110, a group formation unit 130, slots 140S0-140S4 (e.g., latches or registers) and decoders 140D0-140D4. Instruction buffer 110 includes buffer entries 120B0-120B31 where each of buffer entries 120B0-120B31 can store a processor instruction, and instruction buffer 110 stores sequential instructions from an instruction cache in buffer entries 120B0-120B31.
Group formation unit 130 forms groups of processor instructions stored in instruction buffer 110 and routes the groups of processor instructions to slots 140S0-140S4. A group of processor instructions is a set of processor instructions that are decoded and dispatched to one or more issues queues, where each of the set of processor instructions is executed independently, or possibly out of order, and completed together. The group of processor instructions is completed together such that a data flow of a sequence of processor instructions that includes the group of processor instructions is unchanged.
In prior art instruction decode unit 100, group formation unit 130 routes non-branch processor instructions to non-branch instruction slots 140S0-140S3 and a branch processor instruction of the processor instruction group to branch slot 140S4. As such, only one branch processor instruction is possible per processor instruction group. Moreover, when decode unit 100 encounters a predicted taken branch instruction, a new cache line of the instruction cache is accessed and placed in a new instruction buffer row (e.g., a row including entries 120B4-120B7). However, decode unit 100 operating in this fashion introduces “holes” or “gaps” in instruction buffer 110 between the predicted taken branch and the target of the branch. These “holes” or “gaps” must be ignored when determining a next instruction to include in a group. Thus, throughput of a superscalar processor is limited in the prior art.