1. Field of the Invention
The present invention relates to computer system processor architectures that support out-of-order execution. More specifically, the present invention relates to an instruction dependency scoreboard unit including a smaller faster portion and a larger slower portion.
2. Related Art
Modern processors typically contain multiple functional units that perform computations concurrently to increase the execution speed of a program. In order to make effective use of these multiple functional units, some processors allow program instructions to be executed out-of-order. Out-of-order execution eliminates the need to wait for all preceding instructions to complete before a executing a given instruction. This leads to better utilization of the multiple functional units, and hence increases computational performance.
One of the challenges in supporting out-of-order execution is to ensure that a given instruction executes only after all preceding instructions upon which the given instruction depends complete. For example, an instruction that adds two registers R1 and R2 must wait for preceding instructions to write values to registers R1 and R2 before adding the registers.
Processors that support out-of-order execution often use an xe2x80x9cinstruction scoreboardxe2x80x9d to keep track of information regarding dependencies between instructions. These processors use this dependency information to determine the order in which instructions issue. In general, a larger scoreboard can keep track of more dependencies, which typically increases the number of instructions that are ready to issue in a given cycle. This leads to better utilization of the multiple functional units and thereby improves computer system performance.
Unfortunately, as an instruction scoreboard increases in size, the access time into the structure implementing the scoreboard also increases. This can reduce system clock speed and can thereby offset the advantages of using a larger scoreboard.
Fortunately, dependencies for faster operations, such as integer and logical instructions, tend to exhibit a high-degree of locality, which means that an instruction scoreboard only needs to keep track of a smaller number of recent preceding instructions in order to efficiently schedule these faster operations. Conversely, dependencies for slower operations, such as floating point operations, tend to exhibit less locality, which means an instruction scoreboard must keep track of a larger number of preceding instructions in order to efficiently schedule these slower operations.
What is needed is an instruction scoreboard that supports high-speed access to dependencies within a smaller number of recent preceding instructions, and supports slower-speed access to dependencies within a larger number of less recent preceding instructions.
One embodiment of the present invention provides a system that selects instructions to be executed in a computer system that supports out-of-order execution of program instructions. The system receives dependency information for a first instruction. This dependency information identifies preceding instructions in the execution stream of a program that need to complete before the first instruction can be executed. The system divides this dependency information into a recent set and a less recent set. The recent set includes dependency information for a block of instructions immediately preceding the first instruction that need to complete before the first instruction can be executed. The less recent set includes dependency information for instructions not in the block of instructions immediately preceding the first instruction that need to complete before the first instruction can be executed.
The system stores the recent set of dependency information in a first store, and stores the less recent set of dependency information in a second store. The first store is smaller and faster than the second store so that an update to dependency information takes less time to propagate through the first store than the second store.
In one embodiment of the present invention, the system receives the dependency information for the first instruction from the first store and the second store, and determines from the dependency information if the first instruction is available to be executed by determining whether all preceding dependencies related to the first instruction have been satisfied.
In one embodiment of the present invention, the system selects a second instruction from instructions that are available to be executed, and executes the second instruction. In a variation on this embodiment, after the second instruction has been executed, the system updates dependency for all dependencies related to the second instruction to indicate that the second instruction has been executed. At a later point in time, the system eventually removes dependency information for the second instruction from the first store and the second store.
In one embodiment of the present invention, the system receives the dependency information from an instruction renaming unit that renames registers for instructions in order to facilitate out-of-order execution. In a variation on this embodiment, the instruction renaming unit receives the first instruction from an instruction fetch unit.
In one embodiment of the present invention, the system divides the dependency information using multiplexers to select the recent set of dependency information.