Field of the Invention
The present invention relates to parallel computing and more particularly to deterministic parallelization.
Description of the Related Art
Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently. There are several different forms of parallel computing, for example bit-level, instruction level, data, and task parallelism. Parallelism has been employed for many years, mainly in high-performance computing. More recently, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multicore processors.
Parallel computers can be classified according to the level at which the computing hardware platform supports parallelism—with multi-core and multi-processor computers having multiple processing elements within a single machine, while clusters and grids using multiple computers to work on the same task. Specialized parallel computer architectures are sometimes used alongside traditional processors, for accelerating specific tasks. The advent of multi-threaded operating environments specifically supports parallelism by executing different threads in different processor or processor cores and assigning different tasks of a computer program to different threads.
Thus, managing the assignment and execution of different processing tasks to different threads of execution can in of itself require sophisticated programmatic logic. Hence, parallel computer programs are more difficult to write than sequentially executing computer programs. Of note, parallel computer programs require constant consideration of programmatic and execution flaws resulting from the inherent concurrency of parallel computing and the shared access to common resources in a parallel computing environment. In this regard, concurrency introduces several new classes of potential software flaws, of which race conditions are the most common.
Race conditions in which different processes attempt to access the same, shared resource, generally can be managed through the intelligent use of locks. Locking mechanisms manage access to a shared resource by maintaining a queue of processes, for example different threads of execution, seeking access to the shared resource and an indication of a current thread accorded exclusive access to the shared resource. The current thread accorded exclusive access can release the lock when the current thread no longer requires access to the shared resource. Thereafter, a different thread in the queue can be granted exclusive access to the shared resource. Of note, a scheduler can manage the granting of access to the shared resource by selecting a thread from the queue and granting exclusive access to the shared resource to the selected thread.
The selection of a thread in the queue by the scheduler can be characterized as “opportunistic”. In this regard, the scheduler is not bound to any particular algorithm for selecting a thread in the queue of a shared resource to exclusively access the shared resource. To wit, given the same entrants in the queue of a shared resource through multiple different executions of a computer program, a different thread can be selected under the same conditions in each instance. While the opportunistic behavior of the scheduler can appear random, the opportunistic behavior of the scheduler provides a degree of simplicity in implementing. The simplicity of implementation, though, comes at a cost—debugging the performance of an application can be challenging due to the difficulty in repeating the same state of an application.
Deterministic management of the locking of a shared resource, unlike opportunistic locking, provides for a predictable way in which a thread in a queue for a shared resource is selected to receive exclusive access to the shared resource. For example, it is known to utilize a deterministic time in selecting a thread in the queue to receive exclusive access to a shared resource. The deterministic time can be computed as the amount of time consumed by a corresponding thread at the time of selection and can be measured, by way of example, as the number of instructions executed by the thread to date.
For instance, it is known to use the number of deferred loads executed by the thread as the deterministic time. Even still, to compute deferred loads, it is required to expressly code the scheduler to a specific hardware reference as different processor architectures provide different interfaces to CPU performance counters requisite to computing deferred loads. Additionally, some processors do not permit access to CPU performance counters to user space processes. To overcome this limitation, an operating system is required.