Instruction fusion is a process that combines two instructions into a single instruction which results in a one operation (or micro-operation, “uop”) sequence within a processor. Instructions stored in a processor instruction queue (IQ) may be “fused” after being read out of the IQ and before being sent to instruction decoders or after being decoded by the instruction decoders. Typically, instruction fusion occurring before the instruction is decoded is referred to as “macro-fusion”, whereas instruction fusion occurring after the instruction is decoded (into uops, for example) is referred to as “micro-fusion”. An example of macro-fusion is the combining of a compare (“CMP”) instruction or test instruction (“TEST”) (“CMP/TEST”) with a conditional jump (“JCC”) instruction. CMP/TEST and JCC instruction pairs may occur regularly in programs at the end of loops, for example, where a comparison is made and, based on the outcome of a comparison, a branch is taken or not taken. Since macro-fusion may effectively increase instruction throughput, it may be desirable to find as many opportunities to fuse instructions as possible.
For instruction fusion opportunities to be found in some prior art processor microarchitectures, both the CMP/TEST and JCC instructions may need to reside in the IQ concurrently so that they can be fused when the instructions are read from the IQ. However, if there is a fusible CMP/TEST instruction in the IQ and no further instructions have been written to the IQ (i.e. the CMP/TEST instruction is the last instruction in the IQ), the CMP/TEST instruction may be read from the IQ and sent to the decoder without being fused, even if the next instruction in program order is a JCC instruction. An example where a missed fusion opportunity may occur is if the CMP/TEST and the JCC happen to be across a storage boundary (e.g., 16 byte boundary), causing the CMP/TEST to be written in the IQ in one cycle and the JCC to be written the following cycle. In this case, if there are no stalling conditions, the JCC will be written in the IQ at the same time or after the CMP/TEST is being read from the IQ, so a fusion opportunity will be missed, resulting in multiple unnecessary reads of the IQ, reduced instruction throughput, and excessive power consumption.