Parallel computing systems, such as a database system implemented in a parallel-processing environment, are typically used to speed up the processing of incoming requests by breaking the requests into multiple tasks and executing the tasks in parallel across multiple processing units. When such a system receives multiple requests that each requires a certain amount of work be performed by multiple processing units, the system typically schedules the work across the processing units in an attempt to ensure that all of the work is completed.
The simplest way to schedule work in a parallel computing system is on a first-come, first-served basis. However, for many applications a more flexible scheduling policy is desirable. For example, in a relational database system that has a backlog of long-running, low-priority queries, it is typically desirable, upon receiving a short-running, high-priority query, to execute the high-priority query immediately rather than forcing it to wait behind the low-priority queries. Also, in a relational database system that has multiple users, it is often desirable to implement a scheduling policy that allows the system administrator to limit the resource consumption of any single user to prevent that user from monopolizing the system.
In general, the scheduling algorithms that are typically used to address these situations have two important characteristics:                1. They sort (that is, re-order) the work to be performed by the parallel processing units; and        2. They sort work based on both (a) global, static quantities (e.g., the priority of a work item or the user associated with a work item) and (b) local, dynamic quantities (e.g., the amount of CPU and disk I/O currently being consumed by a particular processing unit on behalf of a particular user).        
Problems typically arise in parallel computing systems when a work item needs to execute on multiple processing units. Such a work item is typically referred to as “group work” because its execution requires participation by a group of processing units. In general, group work has the following characteristics:                1. It is processed by a group of processing units;        2. It typically must be processed by every processing unit in the group;        3. Once it begins executing on a particular processing unit, it consumes resources on that processing unit until it finishes; and        4. It typically cannot finish until it executes on every processing unit in the group.        
The scheduling algorithms that are used in sorting group work across multiple processing units are prone to deadlock, usually for one of two reasons:                1. Processing units become available at different times. The following example illustrates: A low-priority group work item, L, arrives. It must run on processing units X and Y. X happens to be idle and consequently begins executing L immediately. Y is busy executing a previous work item and consequently queues L. Subsequently a high-priority group work item, H, arrives. It also must run on processing units X and Y. X is already running L and consequently queues H. Y is still running the previous work item and still has L queued. Since H is high-priority and L is low-priority, Y sorts H in front of L, and, once the previous work item completes, it begins running H. At this point X and Y are deadlocked: X is running L (which cannot finish until it runs on Y), and Y is running H (which cannot finish until it runs on X).        2. Sorting based on local quantities. The following example illustrates: A group work item, W1, associated with a user, U1, arrives. It must run on processing units X and Y. Both processing units are busy executing previous work items, and consequently both queue W1. Subsequently a group work item, W2, associated with a different user, U2, arrives. It also must run on processing units X and Y. The particular scheduling policy in effect on the system dictates that work items associated with users who are consuming more than a certain CPU percentage threshold should be treated as low priority. At the moment, on processing unit X, U1 is under this threshold and U2 is over the threshold. Consequently W2 is queued behind W1. However, on processing unit Y the situation is reversed: U1 is over the threshold and U2 is under the threshold. Consequently W2 is queued in front of W1. When the previous work items complete, X begins running W1, and Y begins running W2. At this point X and Y are deadlocked: X is running W1 (which cannot finish until it runs on Y), and Y is running W2 (which cannot finish until it runs on X).        