The disclosure is generally directed to the processing of instructions by a processor and, in particular, to techniques for optimizing the execution of instructions. More particularly, the present disclosure is directed to techniques for facilitating cracking and fusion within a same instruction group.
Traditionally, processors employed in conventional computer systems (data processing systems) executed program instructions one at a time in sequential order. The process of executing a single instruction has usually included several sequential steps. A first step generally involved fetching the instruction from a storage device. A second step generally involved decoding the instruction and assembling any operands. A third step generally involved executing the instruction and storing the results. Some processors have been designed to perform each step in a single processor clock cycle. Other processors have been designed so that the number of processor clock cycles per step depends on the instruction. Modern data processing systems commonly use an instruction cache memory (cache) to temporarily store blocks of instructions. As is known, caches are buffers that store information retrieved from main memory to facilitate accessing the information with lower latency. If a processor locates a desired instruction (or data) in a cache, a ‘cache hit’ occurs, and instruction execution speed is generally increased as cache tends to be faster than main memory. However, if a cache does not currently store a desired instruction (or data), a ‘cache miss’ occurs, and a block that includes the desired instruction (or data) must be brought into the cache (i.e., retrieved from main memory).
Fetching instructions from cache (or main memory) is normally controlled by a program counter. Contents of a program counter typically indicate a starting memory address from which a next instruction or instructions is to be fetched. Depending on processor design, each instruction may have a fixed length or a variable length. For example, a processor may be designed such that all instructions have a fixed length of thirty-two bits (i.e., four bytes). Fixed length instruction formats tend to simplify the instruction decode process. Modern data processing systems commonly use a technique known as pipelining to improve performance. Pipelining involves the overlapping of sequential steps of an execution process. For example, while a processor is performing an execution step for one instruction, the processor may simultaneously perform a decode step for a second instruction and a fetch of a third instruction. As such, pipelining can decrease execution time for an instruction sequence. Superpipelined processors attempt to further improve performance by overlapping the sub-steps of the three sequential steps discussed above.
Another technique for improving processor performance involves executing two or more instructions in parallel. Processors that execute two or more instructions in parallel are generally referred to as superscalar processors. The ability of a superscalar processor to execute two or more instructions simultaneously depends on the particular instructions being executed. For example, two instructions that both require use of a same processor resource (e.g., a same floating point unit (FPU)) cannot be executed simultaneously, as a resource conflict would occur. Two instructions that both require use of the same processor resource cannot usually be combined or grouped with each other for simultaneous execution, but must usually be executed alone or grouped with other instructions. Additionally, an instruction that depends on the result produced by execution of a previous instruction cannot usually be grouped with the previous instruction. An instruction that depends on the result of the previous instruction is said to have a data dependency on the previous instruction. Similarly, an instruction may have a procedural dependency on a previous instruction that prevents the instructions from being grouped in a same group. For example, an instruction that follows a branch instruction cannot usually be grouped with the branch instruction, since the execution of the instruction depends on whether the branch is taken.