A typical computer system includes hardware and software. The hardware includes at least one processing device that executes instructions defined by the software (i.e., an application). The processing device may be a processor, a micro-core on a multi-core processor, or other such device that can process instructions. Often a computer system may include multiple processing devices that execute the applications in parallel. For example, multiple processors and/or multiple micro-cores may execute in parallel. Parallel execution can often shorten the amount of time required to process the instructions of the application. Thus, parallel applications, or applications developed to be executed in parallel, tend to execute faster than applications, which execute serially.
Parallel applications also tend to be more complicated than serial applications. Specifically, a single thread in a serial application does not compete with other threads of the same application to modify and/or read data in memory. In contrast, in a parallel application, multiple threads executing the same application may attempt to modify and read data at different unknown times. Thus, in a parallel application, the value of data in memory may be dependent on the order at which each thread reads the data and writes to the data.
For example, consider the scenario in which thread X needs to add 20 to the value at data element E and thread Y needs to subtract 5 from the value at data element E, where the initial value of data element E is 40. In the first step, thread X reads the value of data element E (i.e., 40). In the second step, thread X adds twenty to the value read for data element E (i.e., 40+20=60). Concurrently with the second step, thread Y reads the value of data element E (i.e., 40). In the third step, thread X stores 60 as the value of data element E. Also, thread Y subtracts 5 from the value thread Y read (i.e., 40−5=35). In the fourth step, thread Y stores 35 as the value of data element E. Thus, the result of this execution is the final value of data element E is 35 rather than the correct value of 55 (i.e., 40+20−5=55).
Thus, the developer must typically be cognizant of the different dependencies and develop the parallel application accordingly. In general, the developer develops the application by creating source code defining the application. Source code is a collection of instructions written in any human-readable programming language. In the source code, the developer defines the number of threads that will execute the application. Further, the developer defines which portion of the parallel application is executed in parallel. The developer defines the disjoint portion of data processed by each thread to generate results. The developer may also define how the different threads communicate and combine the generated results.
Once written, the source code may be compiled to create executable code. Executable code is a collection of instructions understandable by a computer. When the executable code of the parallel application is executed, the threads are generated and executed by the different processing devices according to the instructions defined by the developer.