A typical distributed computer system includes multiple interconnected nodes. Each node in the distributed computer system may include a separate processor. Accordingly, applications, which execute in parallel on the distributed computer system, are able to exploit the processing power provided by interconnection of the processors. For example, by combining the processing power provided by the multiple interconnected nodes, a given computation may be executed much faster by splitting the computation into multiple sections and executing each section of the application in parallel rather than executing the application serially on a single node.
Executing an application across several nodes typically involves determining which portions of the application should be performed serially and which portions of an application may be performed in parallel (i.e., the portion is safe to be performed in parallel). A portion of the application is deemed as parallelizable if the portion may be divided into discrete sections such that each section in the discrete sections may be executed by an individual thread simultaneously. In contrast, portions of the application that when parallelized would result in dependency violations (i.e., data dependencies between threads), such as multiple reads and writes to the same memory location by different threads, are not parallelized.
After determining that data dependencies do not exist in a portion of an application, the section is executed in parallel and individual threads write the results immediately to memory. Alternatively, after parallelizing an application, the results produced by the parallel execution are stored in temporary storage. Then, the results are committed in the order that the results would be if the application were executed in serial. For example, the results from executing a loop in parallel are committed in order of the first iteration results (i.e., results created when executing the first iteration of the loop), the second iteration results, the third iteration results, etc. Thus, because results are committed in the order as if performed serially, a user is assured that the last change to a particular memory location is correct.