Field of the Disclosure
This disclosure relates generally to parallel programming, and more particularly to systems and methods for implementing constrained data-driven parallelism.
Description of the Related Art
The ongoing proliferation of multicore systems in the mainstream computing industry is making parallel programming more commonplace. However, experience over several decades has shown that parallel programming is difficult. Thus far, it has been restricted to an elite group of programming experts.
A particularly thorny issue encountered in parallel programming is the issue of how to implement efficient and correct data sharing between concurrent computations. The traditional lock based mutual exclusion approach is highly cumbersome and error prone. One particularly promising alternative programming abstraction is the atomic block, in which the programmer delineates a block of code as atomic, thus guaranteeing that the code in the block is executed atomically and in isolation from other concurrently executing computations. With this approach, the programmer needs to specify what code needs to execute atomically, and the system transparently guarantees its atomicity. This approach is in stark contrast with the traditional model of lock based mutual exclusion in which the programmer has to map shared objects to locks, manually ensure that the right set of locks are acquired while accessing these objects, and manually release the locks when the object accesses are completed. With this approach, the programmer must also carefully avoid any lock acquisition ordering that may lead to deadlocks. Atomic blocks are very useful tools that relieve the programmer from the headache of explicitly managing synchronization in the parallel application. However, atomic blocks do not help express parallelism in applications, which is another challenge that programmers face.
Exploiting multicore systems requires parallel computation, preferably with minimal synchronization. Another promising approach is data-driven parallelism, in which a computation is broken into tasks, each of which is triggered to run when some (specified) data is modified. These tasks may modify additional data, thereby triggering additional tasks. Thus, changes to data “drive” the parallel computation. In other words, in data-driven parallelism, changes to data spawn new tasks, which may change more data, spawning yet more tasks. With this approach, the computation propagates until no further changes occur. The benefits of this approach can include increasing opportunities for fine-grained parallelism, avoiding redundant work, and supporting incremental computations on large data sets. Nonetheless, data-driven parallelism can be problematic. For example, the convergence times of data-driven single-source shortest paths algorithms can vary by two or more orders of magnitude depending on task execution order.