Computing systems use a variety of techniques to improve performance and throughput. One technique is known in the art as out-of-order (OOO) execution. In OOO execution, instructions are scheduled for execution in parallel and as soon as corresponding source data dependencies can be resolved, thereby increasing execution speed of the overall system.
One method of resolving source data dependencies is shown in FIG. 1 In a dependency matrix 106, dependency graph information is stored corresponding to an instruction to be executed. Each row of the matrix may be used to store dependency graph information for a corresponding instruction. The number of columns in the dependency matrix corresponds to the number of concurrently tracked instructions. Instructions stored in a buffer 109 include one or more micro-operations (UOP entry 103) and corresponding valid indicators (valid bit 101). For any particular instruction with dependency graph information stored in a row of dependency matrix 106, an entry in a column may be set to indicate a prior sequential instruction that produces source data required for its execution.
In order to identify relevant prior sequential instructions, the required sources may be compared against the destinations of all concurrently active instructions. If a dependency is found to exist, then a corresponding column may be used to record the source dependency. When an instruction (or micro-operation) is dispatched from the buffer a column in dependency matrix 106 corresponding to the dispatched instruction may be cleared. When all source dependencies are satisfied for an instruction, it may be dispatched for execution. Zero detect 105 detects when all bits in a dependency vector stored in a row of dependency matrix 106 have been cleared and produces signal 108 to identify the corresponding instruction (or micro-operation) as ready for scheduling. One limitation to this method is the number of comparators required for matching sources with destinations.
An alternative approach calls for tracking dependency information when sources and destinations are renamed. This alternative reduces the number of comparators required if instructions being dispatched are not dependent upon data produced by instructions concurrently being scheduled for execution. If such dependencies can exist, additional comparators may be required to determine when source data needs to be bypassed from one instruction to a subsequent instruction during execution, without waiting for the data to be written into its intended destination. Permitting such data to be bypassed potentially benefits execution speed, but identification of the relevant dependency may delay the dispatch of instructions.
For machines that employ wide superscalar execution of instructions, the number of column entries in a dependency matrix may be very large, and therefore corresponding circuitry to check for dependencies in a row of the dependency matrix needs to be increased accordingly. For the 16-entry buffer 109 depicted in FIG. 1, zero detect 105 may comprise a 16-input NOR gate for each row of dependency matrix 106. As the size of buffer 109 is increased, both dependency matrix 106 and zero detect 105 increase with the square of the size of buffer 109. For example, an 80-entry buffer 109, may require an 80×80 storage array for dependency matrix 106 and a zero detect 105 having eighty 80-input NOR gates. Each 80-input NOR gate, comprising five to six levels of logic, may become too slow to efficiently signal for the dispatch of new instructions (or micro-operations). Therefore, a new method of matching dependencies and dispatching instructions is called for.