Distributed computing is the use of multiple autonomous computers (or processors) to solve computational problems by dividing the problem into many sub-problems that are then solved by one or more of the autonomous computers (or nodes) in a cluster of computers. Distributed shared memory (DSM) is an aspect of distributed computing where each node of a cluster has access to shared memory that is distributed across the cluster and which may comprise local memory, remote memory, or both with regard to each node. Certain performance-critical workloads—such as those requiring more memory or processing power than is available in a single computer—can benefit from the abstraction of a mutable (changeable) shared state that is made possible via distributed computing systems employing distributed shared memory.
To perform computations on very large datasets, distributed computing clusters (comprising, for example, tens to thousands of autonomous computers) employing distributed shared memory may be utilized. However, many computations (and their associated algorithms) may be challenging to parallelize because of the possibility of dependencies existing between certain sub-computations and the inherent difficulty in determining these dependencies in advance. A conservative approach to handling possible dependencies is to wait for prior sub-computations to complete before running subsequent sub-computations that possibly depend on those prior sub-computations, but such an approach requires a substantial amount of synchronization between parallel processes (the autonomous computers via the computer network, for example) which in turn become a substantial bottleneck to efficient computational performance.
Computational dependencies, however, may be relatively rare for certain useful computations compared to the total number of computations over an entire large dataset, and in such instances the vast majority of individual computations (and sub-computations) will have no such dependencies. Thus most computations (and sub-computations) need not be delayed since they do not depend on the completion of other computations, but the seeming difficulty lies in the inability to identify and separate in advance the computations having dependencies from the computations that do not.