One way to speed the execution of a program on a computer or data processing system is to divide its work into multiple threads or tasks and run those tasks concurrently on multiple processors. A general requirement for concurrent execution of two tasks is independence. Two tasks are independent if neither alters a machine state that the other is using. For example, if a first task reads from memory address X then the second task may not write to memory address X. However, a problem arises when the two tasks of a program are coded or programmed to depend on the same memory address X, which often impacts the performance of the program when the two tasks are executed serially or in parallel.
Typically, two tasks may be programmed or compiled to depend on a memory address in one of two ways. First, the two tasks may both be programmed to depend on the value stored in the memory address. For example, two tasks may use a single counter to count instances of a particular event. In that case, every task that operates on the counter depends, for its correct operation, on the value stored in the memory address that holds the counter. Thus, this first form of dependency between two tasks requires that the value be stored by a previous task before a subsequent task is able to correctly update the counter with a new event count.
Second, two tasks may both be programmed or compiled to depend on the memory address but not on the value in the memory address. For example, a particular task may require access to a scratch workspace located at the memory address X for intermediate results but those results may not depend on previous contents of the memory address X.
The two forms of dependence by two tasks on a memory address may be differentiated by observing when the value or contents of a memory address are alive and dead, and the operation that causes the contents to become dead. A value is said to be alive if it may still be used by the program, otherwise it is dead. If a particular value becomes dead as part of an operation that refers to its previous value then that is an example of the first form of dependence. The following code illustrates this:
REAL SCRATCH(N)! Create memory address array as SCRATCHCOMMON SCRATCHDO 10, I = 2, N! Execute “10 loop” SCRATCH(I) = SCRATCH(I−1)/SCRATCH(I)10 END DOPRINT *, SCRATCH! Print the contents of SCRATCHDO 20, I = 1, N! Execute “20 loop” SCRATCH(I) = 0.020 END DO
The “10 loop” task (or tasks if divided for parallel processing) is an example of the first form of dependence. Because the “10 loop” task refers to SCRATCH by name, it depends on the address of SCRATCH. However, the “10 loop” task also refers to a value that was stored in SCRATCH by a previous operation. The “20 loop” task (or tasks if divided for parallel processing) is an example of the second type of dependence. The “20 loop” task cannot proceed until the previous PRINT or WRITE task (e.g., a write of SCRATCH memory address values to an I/O device) is complete. But while the “20 loop” task depends on the availability of the address range referenced by SCRATCH, it does not depend on values contained in that address range. As a result of this second form of dependence, the “20 loop” cannot proceed until the WRITE task no longer depends on the address range referenced by SCRATCH.
A write to or read from to physical I/O devices is generally relatively slow. The low speed of an I/O device is often hidden from a program by copying data from its original location to a buffer in memory and then allowing the program to proceed before the data are committed to physical storage. This breaks the dependency on the address range by moving the data from SCRATCH as fast as it can be moved through memory. However, the large sizes of the data sets in many applications, notably large scientific applications, are often such that the sizes of the buffers is insufficient and the speed of a program is limited by the speed at which the buffer contents can be moved to the physical device.
Another way in which a program could free SCRATCH quickly would be to allocate a local buffer with an application-dependent size that would guarantee that the buffer is large enough to handle the entire SCRATCH array. Such code might look like this:
REAL SCRATCH(N), S2(N)! Create memory address arraysSCRATCH and S2COMMON SCRATCHDO 10, I = 2, N! Execute “10 loop” SCRATCH(I) = SCRATCH(I−1)/SCRATCH(I)10 END DODO 15, I = 1, N! Execute “15 loop” S2(I) = SCRATCH(I)15 END DOPRINT *, SCRATCH ! Print the contents of SCRATCHDO 20, I = 1, N! Execute “20 loop” SCRATCH(I) = 0.020 END DO
Now there is no dependence between the PRINT or WRITE task and the “20 loop” task, so they can be done in parallel. However, this complicates the code and only works well in environments in which there is a spare processor to do the WRITE. In single-processor systems or in systems in which all processors are busy doing other things, which is a common case, loop 15 represents nothing more than extra processing that wastes time and space. Also, if further parallelization is desired then it may become necessary to add still more complexity such as locks or semaphores on S2 to make sure that none of the tasks that wish to use S2 do not conflict with each other. Programs whose data sets are large enough to make it prohibitive to keep S2 around for a long time should also consider dynamically allocating S2, in which case complex code must be written to handle the problems of insufficient memory and all of the other problems arising from dynamic memory allocation.
Therefore, a need has long existed for a method and system that overcome the problems noted above and others previously experienced.