1. Field of the Invention
This invention relates to processors and to methods for executing store instructions and handling instruction antidependencies in a superscalar processor.
2. Description of Related Art
Superscalar processors execute instructions in a linear program thread but include multiple execution units capable of executing instructions in parallel. Parallel execution of instructions improves processor performance by increasing the rate at which instructions are completed but completes some instructions out of the program order, (i.e. before or simultaneously with instructions earlier in the program order.) A superscalar processor must avoid out-of-order execution if completing a later instruction before an earlier instruction fails to implement the program logic. For example, a later instruction that depends on the result from an earlier instruction must be completed after that result is available. A restriction on the order of execution or completion of two instructions is often referred to as an instruction dependency or antidependency.
Superscalar architectures that properly handle instruction dependencies and antidependencies have been developed. One such architecture uses register renaming which allows out-of-order execution but delays committing results to a register file until earlier dependent or antidependent instructions are completed. Architectures with register renaming tend to be complex and therefore expensive to implement.
Another superscalar architecture uses an interlock that stalls decoding or issuing of later instructions having dependencies or antidependencies with earlier pending instructions. However, delaying instruction decoding or issuing can degrade processor performance. For example, a store instruction that requires data from a register in a register file cannot read that data until one or more earlier instructions write the data to the register. In a typical, superscalar architecture a read stage that reads from a register file is early in an execution pipeline and a write stage that writes to the register file is the last stage in the execution pipeline. Accordingly, a pipeline interlock can create a bubble of processor inactivity between decoding of an instruction that writes to a register and a following store instruction that reads from the register. A superscalar architecture is sought that permits a simple pipeline interlock but reduces the bubbles of processor activity that degrade processor performance.