A typical distributed computer system includes multiple interconnected nodes. Each node in the distributed computer system may include a separate processor. Accordingly, applications which execute in parallel on the distributed computer system are able to exploit the processing power provided by interconnection of the processors. For example, by combining the processing power provided by the multiple interconnected nodes, a given computation may be executed much faster by splitting the computation into multiple sections and executing each section of the application in parallel rather than executing the application serially on a single node.
Executing an application across several nodes typically involves determining which portions of the application should be performed serially and which portions of an application may be performed in parallel (i.e., the portion is safe to be performed in parallel). A portion of the application is deemed as parallelizable if the portion may be divided into discrete sections such that each section in the discrete sections may be executed by an individual thread simultaneously. In contrast, portions of the application that when parallelized would result in dependency violations (i.e., data dependencies between threads), such as multiple reads and writes to the same memory space by different threads, typically are not parallelized.
One method of parallelizing an application is for a programmer to analyze the application and determine how to parallelize the application. For example, the programmer may analyze a loop in the application to determine whether potential data dependencies between loop iterations within the loop of the application exist. Once the programmer has determined how to parallelize the loop, the programmer may add in specific instructions, such as message passing interface (MPI), to the application for parallelizing the loop in the application.
Another solution to parallelize the application is for a compiler to add in instructions for parallelizing the application statically at compile time. For the compiler to add the aforementioned instructions, the compiler must analyze the application for possible data dependencies, and determine how to break the application into discrete portions. Ensuring data dependencies are known is challenging if not impossible in general, because many commonly occurring loops have memory accesses that preclude automatic parallelism. Specifically, an application may have memory references which are only determined at execution time, such as subscripted subscripts (e.g., A[C[i]]=D[i]) and pointer variables (e.g., *ptr=0.50; ptr++).
Another possible solution is to perform the analysis after the execution time using the assumption that the loop is parallelizable. If a dependence violation is discovered after execution, then the loop may be deemed as not parallelizable.