A typical multi-processor computer system includes the capability to execute multiple instructions at the same time. Specifically, the multi-processor computer system includes multiple interconnected processors (e.g., multiple processing cores and/or central processing units). Accordingly, applications, which execute in parallel on the distributed computer system, are able to exploit the processing power provided by interconnection of the processors. For example, by combining the processing power provided by the multiple interconnected processors, a given computation may be executed much faster by splitting the computation into multiple segments and executing each segment of the application in parallel rather than executing the application serially on a single processor.
Executing an application across several processors typically involves determining which portions of the application must be performed serially and which portions of an application may be performed in parallel (i.e., the portion that is safe to be performed in parallel). A portion of the application is deemed parallelizable if the portion may be divided into discrete segments such that each segment in the discrete segments may be executed by an individual thread simultaneously. In contrast, portions of the application that when parallelized would result in many thread interdependencies (i.e., data dependencies between threads), such as multiple reads and writes to the same memory space by different threads, are not typically parallelized.
One method of parallelizing an application is for a programmer or compiler to analyze the application and determine how to parallelize an application. For example, the programmer may analyze a single loop in the application to determine whether there are potential data dependencies between loop iterations within the single loop of the application. Once the programmer has determined how to parallelize the single loop, the programmer may add in specific instructions to the application for parallelizing the single loop in the application. Thus, iterations of the same loop may execute on different processors simultaneously.