In typical database systems, users write, update and retrieve information by submitting commands to a database application. To be correctly processed, the commands must comply with the database language that is supported by the database application. One popular database language is known as Structured Query Language (SQL).
Multi-processing systems are typically partitioned into nodes, where each node may contain multiple processors executing multiple concurrent processes. To fully utilize the computing power of a multi-processing system, a database application may divide a large processing task required by a query into smaller work granules which may then be distributed to processes running on one or more processing nodes. Because the various work granules are being performed in parallel, the processing required by the query can be completed much faster than if the processing were performed on a single node by a single process. One mechanism for implementing parallel operations in a database management system is described in U.S. patent application No. 08/441,527 entitled "Method and Apparatus for Implementing Parallel Operations in a Database Management System" filed on May 15, 1995, by Gary Hallmark and Daniel Leary, incorporated herein by reference.
Unfortunately, the performance benefit gained by parallelizing a database operation is significantly diminished when the operation is divided into work granules that involve writing to the same data container. For example, consider the operation illustrated in FIG. 1, where a coordinator process 102 receives a query 104 that requires a block of data (DATA A-Z) to be generated and then inserted into a table 120 in a database 122.
The coordinator process 102 divides the query into work granules 106, 108 and 110, where each work granule calls for the generation and insertion of a subset of the set of data. In the illustrated example, work granules 106, 108 and 110 represent generate and insert operations for subsets A-G, H-Q and R-Z of data A-Z, respectively. The coordinator process 102 then distributes the work granules 106, 108 and 110 to processes 112, 114 and 116, respectively. Preferably, processes 112, 114 and 116 are executing on separate processors to allow maximum concurrency.
In the system illustrated in FIG. 1, access to the data container into which data is to be inserted (table 120) is governed by a lock. Because the work granules involve writing data into table 120, the exclusive lock on table 120 is obtained before they perform their assigned work granules. In the illustrated example, the exclusive lock for table 120 has been granted to process 116. Consequently, process 116 can proceed to perform work granule 110 by inserting subset of data R-Z into table 120. However, an exclusive lock can be held by only one process at a time. Therefore, processes 112 and 114 must wait for process 116 to release the exclusive lock for table 120. When process 116 completes the performance of work granule 110, process 116 will release the exclusive lock for table 120. At this time, the exclusive lock may be granted to one of the other processes 112 and 114 waiting to write to table 120.
Even when access to the data container into which data is to be inserted is not governed by a single lock, parallel insertion operations may be inefficient. For example, even when insertion into a table does not require an exclusive lock on the entire table, each process assigned to insert data must search for empty space within the table into which it may insert its data. As a process searches for empty space within a table, it is competing with all of the other processes that are searching for empty space within the same table. The more processes searching for insertion space within the same table at the same time, the greater the overhead involved in each of the parallel insertion operations.
The speed at which data can be inserted into an existing data container is also reduced by the overhead associated with generating redo and undo log information for the insertions. For example, an undo log record and a redo log record are typically generated for every row that is updated. The redo log record indicates how to reapply the change in case of a failure. The undo log record indicates how to remove the change. While this log information is used to return the database to a consistent state after a failure, generation and storage of the log information slows down any insertion process.