Traditionally, computers have used a control flow model of program execution. This model is an imperative model, that is, the user tells the computer which instructions to execute and when. Instructions may be conditionally executed or repeatedly executed with the use of branches at the machine level. A branch causes the computer to (conditionally) change the order in which instructions are to be executed. In the traditional model instructions are executed one at a time strictly in the specified order.
In recent years computer designers have sought to improve performance by executing more than one instruction at a time and possibly out-of-order. This is an exploitation of Instruction Level Parallelism (ILP), also popularly known as a “superscalar” approach. ILP is possible because not all instructions' inputs come from immediately-prior instructions.
Ignoring control flow for the minute, the only necessary constraint to ensure correct program execution is to generate instruction results before they are supposed to be used by other instructions. Thus, say an instruction x=y+z is waiting to execute; as soon as both of its inputs y and z have been generated the instruction may execute or “fire”, Sending inputs to an adder, the adder performing the operation and then saving the result in variable or register x. Instructions waiting for the new value of x, that is having x as an input, may then potentially fire themselves. This is a case of the waiting instruction being data dependent on the former. This type of execution model is often referred to as the data flow model.
Modern processors present the appearance of the traditional control flow model to the user, but employ a data flow model “under the hood”. Thus, the relative conceptual simplicity of the control flow model is maintained with the improved performance of the data flow model.
In the data flow model branches must still be used and are problematic. The typical approach today is to predict the outcome of conditional branches and then speculatively execute the corresponding code. Once the value of the branch condition is known, the branch is said to have been resolved. If the prediction was correct, nothing special needs to be done. However, if there was a misprediction, the computer must effectively reset its state to what it was just before the branch was first encountered. Even though branch prediction accuracies for real code are generally at or above 90%, mispredictions are still an impediment to obtaining higher performance.
In prior work we demonstrated a variation of branch speculation called Disjoint Eager Execution (DEE) which may vastly improve computer performance. See, for example the paper by A. K. Uht and V. Sindagi, entitled “Disjoint Eager Execution: An Optimal Form of Speculative Execution”, Proceedings of the 28th International Symposium on Microarchitecture (Micro-28), pp. 313-325. IEEE and ACM, November and December 1995, incorporated herein by reference. DEE is a form of multipath execution; code is executed down both paths from a branch. The code execution is unbalanced; code on the predicted or Main-Line (ML) path is given preferential priority for execution resources over code on the not-predicted path. When the branch resolves, results for either branch direction are available, and hence the performance penalty due to a misprediction is greatly reduced. ILP of the order of ten's of instructions executing at once was shown to be possible, as compared with an ILP of 2-3 instructions in existing processors.
Our prior proposed machine realization of DEE with a data flow equivalent required many large and cumbersome data dependency and control dependency bit matrices. Data and control issues were treated separately. Approaches to reducing the size of the matrices were partially devised but never proven.
Other approaches, including current microprocessors, also need a lot of hardware to realize data flow even with simple branch prediction. In particular, data dependencies must still be computed and other complex operations performed for code to be correctly executed. Hence all of these other ILP approaches are not scalable in that their hardware cost typically grows as the square of the number of execution units in the machine.
Other researchers have demonstrated the value of data speculation. See for example, the papers by M. H. Lipasti, C. B. Wilkerson and J. P. Shen, “Value Locality and Load Value Prediction”, in Proceedings of the Seventh Annual International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-VII), pp. 138-147, ACM, October 1996, and Y. Sazeides, S. Vassiliadis and J. E. Smith, “The Performance Potential of Data Dependence Speculation & Collapsing” in Proceedings of the 29th International Symposium on Microarchitecture (MICRO-29), pp. 238-247, IEEE and ACM, December 1996. Both papers are hereby incorporated by reference. In this scenario, input values for some instructions are predicted and the instructions allowed to execute speculatively. As with control speculation, there is a penalty for data value misprediction. No on has yet, to our knowledge, combined data speculation with DEE.