The present application relates generally to an improved data processing apparatus and method and more specifically to mechanisms for building approximate data dependences information with a moving window.
A compiler is a computer program, or set of programs, that transforms source code written in a computer language into another computer language that is executable by a computing device, e.g., object code having a binary form. Many types of compilers today also perform optimizations on the source code as part of the transformation from source code to object code or executable code. These optimizations involve tuning the output of the compiler so as to minimize or maximize some attributes of the executable program. For example, such optimizations may be directed to minimizing the time taken to execute a program, minimize the amount of memory used by the program, minimize the power consumed by the computer in running the program by minimizing the number of resources utilized, etc.
In performing such optimizations, it is often necessary for the compiler to determine runtime dependences between portions or segments of code. Runtime dependences describe how code segments depend on each other. A dependence is essentially two or more statements addressing the same memory location. For example, one type of optimization that a compiler may perform is loop optimization in which a loop code in the executable code is parallelized by transforming the loop code into code that may be executed in a parallel manner by multiple threads. In order to perform such loop optimization, it is important to know how memory accesses by the loop code may be dependent upon each other.
It is well known that compiler optimizers make conservative assumptions to ascertain that code is generated correctly. Therefore, many runtime dependences are assumed to exist by the compiler but are actually non-existent. Even if a dependence occurs only once at runtime, the compiler has to assume its existence, which prohibits many optimizations. Moreover, a major difficulty for loop parallelization is the uncertainty of memory accesses across iterations, which are often impossible to determine at compilation time. Several obstacles may prevent the compiler from properly deriving these dependences, such as pointer accesses that may not be determined statically, uncertain control flow that may bypass some memory accesses, array elements indexed by complicated computations, or array elements indexed by other arrays (indirect array accesses). If a possible cross-iteration access is guarded by a conditional branch, there is often little the compiler can do to help eliminate the uncertainty of this possible cross-iteration access.
The programmer may have a great deal of knowledge about the likely behavior of the code, but it is rare that this knowledge is made available to the compiler due to the lack of time or the expressiveness in the programming language. As a result, the runtime behavior of a program is often much more constrained by the compiler than is strictly necessary in order to capture all possible dependences when optimizing the code. It may be that dependences that occur in real executions are fewer and simpler than the pessimistic scenario the compiler is forced to assume.
This lack of knowledge by the compiler regarding runtime dependences prevents code from being parallelized in an optimal manner. Parallelization opportunities may be lost for code that is completely parallelizable or partially parallelizable due to the fact that the compiler must take a more pessimistic approach to optimizations as noted above. Most compilers cannot handle parallelism that exists only in part of the code. For example, a loop is either completely parallelizable or executed sequentially. In other words, a slight chance of dependence renders the loop completely non-parallelizable. However, often times some of the iterations can be executed in parallel with other iterations in a limited scope.
Under many circumstances, capturing runtime dependences must be achieved with low overhead. For parallelization that requires this dependency information to be fed back quickly, the amount of time spent on getting the runtime dependences is critical. This is also important for other just-in-time optimizations.