When maintaining a database management system (“DBMS”), especially a DBMS adapted to manage very large database objects (e.g. tables, indexes, etc.), it is useful to partition the database object into smaller, more manageable pieces, known as “partitions”. A database object can be partitioned into either “horizontal” or “vertical” partitions.
Vertical partitioning encompasses splitting up the columns of the data stored in the database object into groups, and storing each group in a different partition. Horizontal partitioning encompasses splitting up the rows of the data stored in the database object into groups and storing each group in a different partition, according to a partition function. For example, a database object may be horizontally partitioned by range of dates, which splits the data up by the month it was entered into the database object. Alternatively, the database object may be partitioned according to a hashing function, which attempts to distribute the rows of data equally across a specified number of partitions. Many different partitioning functions are possible, depending on the particular design choices made by the database object or DBMS developer. Among the benefits provided by horizontal partitioning is improved query performance through the ability to do partition pruning and partition-wise joins, thereby reducing the number of partitions that a query must process in order to generate a query result set.
The DBMS also provides several partition maintenance operations, to allow, for example, partitions to be: 1) added or deleted, 2) split into multiple partitions, or 3) merged together into a single partition. The mechanics of each of these partition maintenance operations is dependent upon the partitioning method used in partitioning the database object. For example, a split operation would perform one set of actions on a partition created on a date-range, and a different set of actions on a partition created using a hashing function.
Indexes are another useful tool in designing large-scale databases. An index contains a subset of the data stored in a database object, used to more rapidly access and manipulate the data stored in the database object. Indexes are typically themselves stored in database objects, and are associated with the data they index. For example, if a table contains employee data, it may have the following five columns: Employee ID, Employee Name, Employee Address, Supervisor, Telephone Number. An index created on the employee table, indexing the employees by their Supervisor, would contain only two columns: Supervisor and Employee ID. The index could optionally be sorted by Supervisor. Assume a query or a user wanted to, for example, access the employee table and retrieve all employees having “Mary Smith” for a Supervisor.
Since the index contains only a subset of the employee data, it is faster to read the index from long-term storage than it is to read the underlying employee table. Also, since the index is already sorted by Supervisor, it is faster to retrieve the records having “Mary Smith” as Supervisor. Once the appropriate records have been identified by searching the index, then only those records which have “Mary Smith” as Supervisor are retrieved from the employee table, via the Employee ID field stored in the index.
Among the different types of indexes are “native indexes” and “domain indexes”. Native indexes are indexes created using indexing schemes built into the DBMS as provided by the DBMS manufacturer. Domain indexes are indexes created using user-defined indexing schemes, provided by users, indexing scheme developers or others outside of the DBMS manufacturer. In domain indexes, the code for index definition, index maintenance, and index scan is provided externally, by an indexing scheme developer or other user of the DBMS. In native indexes, the code for index definition, index maintenance, and index scan is provided internally, by a developer or manufacturer of the DBMS itself.
While partitioned database objects and domain indexes are both useful tools in database design, difficulties arise when a DBMS tries to support domain indexes on partitioned database objects. Database systems typically support multiple partitioning methods, as discussed above. For example, a DBMS may support partitioning a database object such as a table by either “value range”, “hash”, or “list” schemes, as well as composite partitioning methods such as “range-hash”. Each of these partitioning methods typically requires partition maintenance operations that are specific to the partitioning method used.
Conventional DBMS that support domain indexes on partitioned tables typically require the domain index developer to create a separate set of partition maintenance operations for each supported partitioning method. This approach requires the domain index developer to understand all of the details of the various partitioning methods and how they impact the design of partition maintenance operations for the particular partitioning method. Since the partition maintenance operations also have to be incorporated into each domain indexing scheme, this approach also causes wasteful duplication of effort across the different indexing schemes.
Additionally, the conventional approach creates a dependency between the DBMS manufacturer's decision to incorporate additional partitioning methods into the DBMS and the user-defined domain indexing code. Whenever the DBMS manufacturer introduces a new partitioning method, all of the DBMS users have to write new code to implement any desired partition maintenance operations within their domain indexing schemes, for the new partitioning method.
Furthermore, the domain index data itself is typically stored in database objects. A typical implementation has index data for each partition of the underlying database object stored in a separate database object. This approach leads to an explosion of database objects being created to contain the index data, as well as potential creation of a inordinately large number of cursors to process operations on the domain index that span multiple partitions. These situations can cause a significant impact on overall DBMS performance.
Therefore, there is a need for a system of managing domain indexes that allows for domain indexes on partitioned tables without causing a dependency between the introduction of additional partitioning methods and the user-defined indexing code, and that minimizes the amount of code that the user needs to create.