1. Technical Field
The present invention relates generally to methods and apparatuses for speeding up program execution time by making use of data-dependent optimization of program execution. More specifically, the present invention relates to designing program execution paths for several potential data statistics on the basis of source-coding principles, and dynamically optimizing program execution time, during run-time, on the basis of the designed execution paths and the statistics of the incoming data stream to be processed.
2. Background Art
Data-dependent optimization of program execution refers to re-ordering of program modules (each of which may be one or more instructions) on the basis of the data to be processed. Data-dependent optimization can be used to considerably improve the average computational complexity of programs, especially in the case where different possible data inputs vary significantly in terms of the program modules, which are needed for processing, and in terms of the complexity of each such program module. A prime example of a domain in which this is the case is multimedia processing. In media compression or filtering, for instance, there may be a significant difference in the number of times each program module is invoked to process an independently coded data block, as compared to the case of a predictively coded data block. Also, the complexity of each program module may be significantly different in the two cases.
FIG. 1 depicts an exemplary program, which contains multiple branch instructions invoked at various levels. The input data signal 100 is input to the first branch instruction B1 101. The result of the branch instruction is binary, and it is used to select between one of two alternate subsequent program execution paths. If the outcome of branch instruction 101 is a 0, program execution continues with the execution of processing module 102, which requires c0 cycles to execute. The result of the processing module 102 is program output o0 103. If the outcome of branch instruction 101 is a 1, program execution continues with the execution of processing module 110, which requires c1 cycles to execute. The result of the processing module 110 is input to a second branch instruction B2 111. If the outcome of branch instruction 111 is a 0, program execution continues with the execution of processing module 112, requiring c2 cycles, whose result is program output o1 113. Alternately, if the outcome of branch instruction 111 is a 1, processing module 120 (requiring c3 cycles) is executed, and the result is input to branch instruction B3 121. Branch instruction 121 selects between processing module 124 (with output o3 125) and processing module 122 (with output o2 122).
The execution path order of the program can be described as a tree, where each vertex of the tree represents a branch instruction. Thus the vertices of the tree in FIG. 1 are the branch instructions 101, 111 and 121. Each edge of the tree represents a program module, comprised of code instructions, which are executed when the edge lies on the execution path of the program. For each edge ei of the tree, denote ci as the computational complexity of the set of instructions represented by that edge. In FIG. 1 the edges of the tree represent the program modules 102, 112, 122 and 124. The total complexity associated with each possible outcome oi is the sum of the complexities of the edges lying on the path from the root of the tree to that output. Represent the probability of occurrence of each output as Pi, and represent the total complexity associated with each output as Ci. Thus, in FIG. 1, the total complexity associated with output o2 is C2=c1+c3+c4, and the probability of occurrence of output o2 is P2. To minimize the expected complexity of execution of the program (or equivalently to maximize its expected execution speed), it is necessary that ΣPiCi be minimized.
FIGS. 2(a) and 2(b) show the working of two conventional methods for data-dependent program optimization. The first method, illustrated in FIG. 2(a) employs static optimization during compilation, wherein sample data sets, termed training data sets, are used to tune compiler output. Specifically, during compilation, statistics collected from sample data sets are used to determine a fixed program execution path, which indicates the order in which the branch instructions and program modules are to be executed. During execution the input data signal 200 is processed using the determined static execution path 201, resulting in the output signal 202. Various embodiments of this method are described by M. Haneda, P. M. W. Knijnenburg and H. A. G. Wijshoff, On the Impact of Data Input Sets on Statistical Compiler Tuning, Proc. Workshop on Performance Optimization of High-Level Languages and Libraries (POHLL), 2006, and by R. P. J. Pinkers, P. M. W. Knijnenburg, M. Haneda, and H. A. G. Wijshoff, Statistical Selection of Compiler Options, IEEE MASCOTS 2004. The main limitation of this method is the underlying assumption that the statistics of the training data used to determine the execution path during compilation would be statistically typical of the input data observed during execution. This assumption may be incorrect at times. Further the use of a fixed program execution order makes this approach non-adaptive, and unsuitable for the case where the data-statistics vary with time. A further shortcoming of these approaches is that the compiler tuning is often ad-hoc, and is not guaranteed to minimize the expected complexity of execution of the program even in the case that the training data is statistically typical.
The second conventional method, shown in FIG. 2(b), employs data-value dependent execution to speed-up program execution time. More specifically, in this method, multiple alternative program execution orders are employed each of which is efficient for a specific input data value or for a specific set of input data values. For example, in FIG. 2(a), execution path 212 is efficient when the input data signal 210 has the value 0, execution path 214 is efficient when signal 210 has the value 1, and execution path 213 is efficient when signal 210 has a value which is neither 0 nor 1. During execution, the input data signal 210 is input to the selector 211, which selects the appropriate execution path based on the value of the data signal. The output signal 216 is derived from the selected execution path. An embodiment of this method is described by J. Gonzalez and A. Gonzalez, The potential of data value speculation to boost ILP, Proc. 12th ACM International Conference on Supercomputing, 1998. The main shortcoming of this method is that it is limited, in practice, to providing optimized performance when the incoming data takes values in a small subset of the most frequently occurring data values. For example, for media compression programs, an optimized execution path may only be provided for the case where the input signal is 0, and a non-optimized execution path may process all other signal values. A further shortcoming of this method is that it is also non-adaptive; if the most frequently occurring data-values change over time, the employed program execution orders become computationally inefficient.
Therefore, a need exists for an improved method for speeding up program execution time by making use of data-dependent optimization of program execution, which can adapt during run-time to the changing statistics of the incoming data, and which minimizes the program execution time for a large class of signals.