In typical database systems, users write, update and retrieve information by submitting commands or "queries" to a database application. To be correctly processed, the commands must comply with the database language that is supported by the database application. One popular database language is known as Structured Query Language (SQL).
Multi-processing systems are typically partitioned into nodes, where each node may contain multiple processors executing multiple concurrent processes. To fully utilize the computing power of a multi-processing system, a database application may divide a large processing task required by a query into smaller work granules which may then distributed to processes running on one or more processing nodes. Because the various work granules are being performed in parallel, the processing required by the query can be completed much faster than if the processing were performed on a single node by a single process. One mechanism for implementing parallel operations in a database management system is described in U.S. patent application Ser. No. 08/441,527 entitled "Method and Apparatus for Implementing Parallel Operations in a Database Management System" filed on May 15, 1995, by Gary Hallmark and Daniel Leary, incorporated herein by reference now abandoned and re-filed as Ser. No. 08/898,080.
Computer systems often contain resources that can only be accessed in a particular way by one process at a time. In database systems, such resources include sets of data, such as tables, rows, or objects, that can only be written by one process at a time.
Mechanisms have been developed which control access to resources. One such mechanism is referred to as a lock. A lock is a data structure that indicates that a particular process has been granted certain rights with respect to the resource. Before a process can perform an operation on a resource, the process is required to obtain a lock that grants to the process the right to perform the desired operation on the resource.
To perform a write operation on a set of data, a process typically must obtain an exclusive lock on the set of data. To obtain an exclusive lock, the process transmits a request for an exclusive lock to a lock manager associated with the set of data. A lock manager is a process that is responsible for granting, queuing, and keeping track of locks on one or more resources.
If the lock manager has already granted a lock on the resource to another process, or has already received a request for an exclusive lock from another process, then the lock manager process cannot grant the requested exclusive mode lock. Under such circumstances, a lock manager will typically place the lock request in a queue, and grant the request when the request is at the head of the queue and all outstanding locks on the resource have been released.
Unfortunately, the performance benefit gained by parallelizing a database operation is significantly diminished when the operation is divided into work granules that involve writing the same set of data. For example, consider the operation illustrated in FIG. 1, where a coordinator process 102 receives a query 104 that requires a block of data (DATA A-Z) to be generated and then inserted into a table 120 in a database 122.
The coordinator process 102 divides the query into work granules 106, 108 and 110, where each work granule calls for the generation and insertion of a subset of the set of data. In the illustrated example, work granules 106, 108 and 110 represent generate and insert operations for subsets A-G, H-Q and R-Z of data A-Z, respectively. The coordinator process 102 then distributes the work granules 106, 108 and 110 to processes 112, 114 and 116, respectively. Preferably, processes 112, 114 and 116 are executing on separate processors to allow maximum concurrency.
Because the work granules involve writing table 120, the processes must obtain an exclusive lock on table 120 before they can perform their assigned work granules. In the illustrated example, the exclusive lock for table 120 has been granted to process 116. Consequently, process 116 can proceed to perform work granule 110 by inserting subset of data R-Z into table 120. However, an exclusive lock can be held by only one process at a time. Therefore, processes 112 and 114 must wait for process 116 to release the exclusive lock for table 120.
When process 116 completes the performance of work granule 110, process 116 will release the exclusive lock for table 120. At this time, the exclusive lock may be granted to one of the other processes 112 and 114 waiting to write to table 120. The process that is not granted the exclusive lock must wait for the process that is granted the exclusive lock to complete its work granule and release the exclusive lock. When the last process has been granted the exclusive lock, the last process may finally perform the work granule assigned to the last process.
As is evident by the example described above, an attempt to parallelize an operation that involves writing a set of data in a database results in a serial operation, where each of the processes involved in the operation must simply await its turn to perform the portion of the operation that has been assigned to it. Such an attempt to parallelize a write operation may actually result in slower execution than if no attempt to parallelize is made, due to the increased overhead of task scheduling, inter-node communication, and lock management.