This invention relates to an improved architecture for a central processing unit in a general purpose computer, and, specifically, it relates to a method and apparatus for extracting low-level concurrency from sequential instruction streams.
A timeless problem in computer science and engineering is how to increase processor performance while keeping costs within reasonable bounds. There are three fundamental techniques known in the art for improving processor performance. First, the algorithms may be re-formulated; this approach is limited because faster algorithms may not be apparent or achievable. Second, the basic signal propagation delay of the logic gates may be reduced, thereby reducing cycle time and consequent execution time. This approach is subject not only to physical limits (e.g., the speed of light), but also to developmental limits, in that a significant improvement in propagation delay can take years to realize. Third, the architecture and/or the implementation of a computer can be reorganized to more efficiently utilize the hardware, such as by exploiting the opportunities for concurrent execution of program instructions at one or more levels.
High-level concurrency is exploited by systems using two or more processors operating in parallel and executing relatively large subsections of the overall program. Low-level (or semantic) concurrency extraction exploits the parallelism between two or more individual instructions by simultaneously executing independent instructions, i.e., those instructions whose execution will not interfere with each other. Low-level concurrency extraction uses a single central processor, with multiple functional units or processing elements operating in parallel; it can also be applied to the individual processors in a multiprocessor architecture.
Extraction of low-level concurrency starts with dependency detection. Two instructions are dependent if their execution must be ordered, due to either semantic dependencies or resource dependencies. A semantic dependency exists between two instructions if their execution must be serialized to ensure correct operation of the code. This type of dependency arises due to ordering relationships occurring in the code itself.
There are two forms of semantic dependencies, data and procedural. Procedural dependencies arise from branches in the input code. Data dependencies arise due to instructions sharing sources (input) and sinks (results) in certain combinations. Three types of data dependencies are possible, as illustrated in Table I. In the first type, a data dependency exists between instructions 1 and 2 because instruction 1 modifies A, a source of instruction 2. Therefore instruction 2 cannot execute in a given iteration until instruction 1 has executed in that iteration. In the second type, instruction 1 uses as a source variable A, which is also a sink for instruction 2. If instruction 2 executes before instruction 1 in a given iteration, then it may modify A and instruction 1 may use the wrong input value when it executes. In the third type, both instructions write variable A (a common sink). If instruction 1 executes last, an unintended value may be written to variable A and used by subsequent instructions.
TABLE I ______________________________________ Type 1 Type 2 Type 3 ______________________________________ Instruction 1: A = B + 1 C = A * 2 A = B + 1 Instruction 2: C = A * 2 A = B + 1 A = C * 2 ______________________________________
In the prior art, all three types of data dependencies have generally been enforced. Although the effects of the first type of data dependency can never be avoided, the effects of the second and third types can be reduced if multiple copies of a variable exist. However, prior art efforts to reduce or eliminate the effects of type 2 and type 3 data dependencies suffer from undesirable implementation features. The algorithms for instruction execution are essentially sequential, requiring many steps per cycle, thereby negating any performance gain from concurrency extraction. The prior techniques also only allow one iteration of an instruction to execute per cycle and are potentially very costly.
Further, in the prior art, branch prediction techniques have been used to reduce the effects of procedural dependencies by conditionally executing code beyond branches before the conditions of the branch have been evaluated. Since such execution is conditional, some code-backtracking or state restoration has heretofore been necessary if the branch prediction turns out to be wrong. This complicates the hardware of machines using such techniques, and can reduce performance in branch-intensive situations. Also, such techniques have usually been limited to conditionally executing one branch at a time.