The present invention relates to database systems, and more specifically to operations performed on partitions.
When a database system has very large tables consisting of potentially millions of rows, for example, it is desirable to divide the tables into subtables (xe2x80x9cpartitionsxe2x80x9d) of a more manageable size. Creating partitions in a table can have positive effects on table maintenance and query processing.
One method for specifying the partitions of a table is by xe2x80x9crangexe2x80x9d partitioning. When using range partitioning, a range of column values (partitioning key values) are associated with each partition and determine which record belongs to which partition.
FIG. 1 illustrates an exemplary table 100 with corresponding partitions 110, 120 and 130. A partitioning key 102 is used to determine which record belongs in which partition. In the particular example shown in FIG. 1, the partitioning key 102 is a date field (HIRE DATE) and the rows in partitions 110, 120 and 130 of table 100 are grouped together by date.
FIG. 2 illustrates an exemplary data dictionary 210 on a disk 200. Metadata (data about the data) is contained in a data dictionary 210. The data dictionary 210 typically has many tables in it. For example, the data dictionary 210 keeps a table 212 that stores data about each of the columns of table 100, a table 214 that stores data about each of the indexes for table 100 and a table 216 that stores data about each of the partitions 110, 120 and 130 of table 100. The table 216 that stores data about the partitions, for example, contains three records, one record for each partition 110, 120 and 130. The record for a given partition contains data such as the beginning and ending boundaries of the partition and a pointer to the storage device on which the partition is located.
Each attribute associated with an object in a database system may be classified as either a logical attribute or a physical attribute. Logical attributes are those attributes that are visible to an application programmer or end user. For example, the number of columns in a table is a logical attribute of the table. If a logical attribute of an object is changed, then an application program that uses the object may have to be modified.
Physical attributes are those attributes that are not visible to an application programmer or end user. For example, the storage location of the data for a table is a physical attribute. A change to a physical attribute of an object is typically transparent to applications and end users. Consequently, an application program that uses an object typically does not have to be modified in response to changes in the physical attributes of the object.
The partitions for a particular table in a database system have the same logical attributes, such as the same column definitions, but may have quite different physical attributes. For example, the partitions of a table may reside on different physical devices in a network (i.e., the different partitions may reside on physically separate disk drives in the database system).
A partition maintenance operation is an operation that affects the definition of a partitioned table and/or one or more partitions of the partitioned table. For example, a partition maintenance operation may be used to add a new partition to an existing table or to move a partition to a different storage device. Data definition language statements such as ADD PARTITION, SPLIT PARTITION, MOVE PARTITION, DROP PARTITION and MERGE PARTITION are examples of partition maintenance operations. Partition maintenance operations should only affect the physical attributes of a table, and not impact the overall logical appearance of a table.
Locking mechanisms are employed in a database system to manage concurrent access to the data and the metadata in the database system. One type of lock is a data lock. A data lock is a lock acquired on a body of data. A second type of lock is a data dictionary lock. A data dictionary lock is a lock acquired on a body of metadata in a data dictionary.
There are also different modes of locks. For instance, a shared lock is acquired on an object by a user or process before the user or process reads from the object. A shared lock may be acquired on an object by multiple users or processes. An exclusive lock, however, is acquired by a single user or process on an object and is acquired before the user or process writes to the object. (Of course, an exclusive lock held by a user or process on an object will also allow reads from the object by the user or process holding the exclusive lock.)
During a partition maintenance operation, users may have to be blocked from trying to access or modify the data of the affected table 100 and from trying to access or modify the metadata associated with the affected table 100 in the data dictionary 210. Access may have to be blocked because accessing or modifying the data in the affected table 100 or the metadata associated with the affected table 100 in the data dictionary 210 could result in errors for other user processes, corrupted data in the table 100 or corrupted metadata in the data dictionary 210 (depending on the particular partition maintenance operation).
One possible solution to this problem is to exclusively lock both the metadata associated with the affected table 100 in the data dictionary 210 and the affected table 100 containing the partitions affected by the partition maintenance operation for the entire duration of the partition maintenance operation. FIG. 3 depicts a series of steps associated with single-phase locking for partition maintenance operations.
For example, assume a partition maintenance operation MOVE PARTITION 110 from disk 200 to a disk 250 is initiated using the single-phase locking method depicted in FIG. 3.
First, an exclusive data dictionary lock is acquired on the metadata associated with the affected table 100 in the data dictionary 210 in step 305. In step 310, an exclusive data lock is acquired on the affected table 100. Next, in step 315, the partition maintenance operation is performed. After the partition maintenance operation is performed in step 315, the exclusive data lock on the affected table 100 is released in step 320. Finally, in step 325, the exclusive data dictionary lock on the metadata associated with the affected table 100 in the data dictionary 210 is released.
While the partition maintenance operation MOVE PARTITION 110 is being performed, assume a second partition maintenance operation MOVE PARTITION 120 from disk 200 to a disk 260 is initiated. Using the single-phase locking method described above, all the metadata associated with the affected table 100 in the data dictionary 210 and all the data in the affected table 100 are exclusively locked for the duration of the first partition maintenance operation. Thus, the second partition maintenance operation will be blocked from accessing any of the metadata associated with the affected table 100 in the data dictionary 210 and any of the data in the affected table 100 until after the first operation releases its exclusive locks on the resources, even though concurrent execution of the second operation will not cause any ill effects on the metadata associated with the affected table 100 in the data dictionary 210 or the affected table 100.
In essence, the single-phase locking method is a conservative measure that prevents potential errors by disabling concurrent access to the data in the affected table 100 and the metadata associated with the affected table 100 in the data dictionary 210 and, consequently, to all the partitions (110, 120 and 130) of the affected table 100 during a partition maintenance operation.
A problem with the single-phase method is that it may be inefficient, from a concurrency point of view, to exclusively lock all of the data in the affected table 100 and all of the metadata associated with the affected table 100 in the data dictionary 210 for an entire partition maintenance operation. For example, some of the activity performed during the partition maintenance operation may be of a particular type such that allowing concurrent access to other partitions of the affected table 100 would not result in errors or corrupted data or metadata. Some of these xe2x80x9csafexe2x80x9d activities could be performed while the partition maintenance operation is executing. Accordingly, the use of exclusive locks (data or data dictionary) at a table level during the entire time a partition maintenance operation is running would cause other processes or users to wait for the locked resources, even during the portions of the partition maintenance operation which xe2x80x9csafelyxe2x80x9d allow concurrent access. Thus, there is a need for a more efficient method for locking resources when performing partition maintenance operations.
A method and apparatus for multi-phase locking of a partitioned body of data when performing partition maintenance operations is provided.
According to the method, a first phase comprises the steps of acquiring a first lock on at least a body of metadata associated with all the partitions referenced in the partition maintenance operation, reading the body of metadata and generating a list of partitions affected by a partition maintenance operation. A second phase comprises the steps of acquiring a data lock on the affected partitions in a table, releasing the first lock on the body of metadata and changing a physical attribute of the data in the affected partitions in the table. A third phase comprises the steps of acquiring a second lock on at least a body of metadata associated with all the partitions affected by the partition maintenance operation, updating the metadata in the data dictionary, releasing the data lock and releasing the second lock on the body of metadata.
In one embodiment, a cost based analysis is performed to determine whether a single-phase locking method or a multi-phase locking method should be performed.
As a result of performing multi-phased locking for partition maintenance operations, the scope of the data locked will be reduced for the duration of the partition maintenance operation. In particular, other operations can read, write and perform partition maintenance operations on partitions not affected by other concurrently running partition maintenance operations since the present invention allows for data locks to be more selectively placed upon the affected partitions. In addition, the present invention allows for the metadata associated with the affected partitions in the data dictionary to be locked in exclusive mode for a shorter length of time than in a single-phase locking method. The net result is a more efficient resource locking method that allows more partition maintenance operations to be concurrently scheduled than does a single-phase locking method, and also it allows additional read or write operations on other partitions not affected by a concurrently executing partition maintenance operation.