Database systems allow for the creation of indexes. Indexes assist the database system with quickly satisfying search requests on indexed tables. An index contains key values derived from data stored in attributes of a table, referred to as base table. Typically, the index is organized (e.g. ordered) by the key values to provide efficient access to data in the base table. For example, a table containing 10 columns may have a corresponding index where one of the columns is a key column and the entries of an index are ordered by the key values in the key column.
Many types of data are indexed, and may use a native index that is built into the database server or any other kind of index. Indexes may be created and maintained for a base table by submitting Data Definition Language (DDL) statements to database system. Once defined in this way, when changes are made to a base table, corresponding changes are automatically made to indexes by the database system.
Storage and Indexing of XML Data
An object-relational database system may be configured to store XML data and to allow access using XML and/or objectional relational constructs. Such database systems also create and maintain forms of indexes that facilitate access to XML data. A database system that has been configured in this way may be referred to as an embodiment of an XML DB. Other XML DB embodiments provide alternate means of storing XML data.
To store an XML document, the document may be stored as a large string of characters in a large object, or the XML document may be decomposed into elements of fragments that are stored as objects in a database, each object being an object of an object class storing portions of XML documents.
To provide efficient access to a collection of XML documents, a “logical index”, referred to herein as a XML index, may be used to index a collection of XML documents. A XML index contains multiple structures that are cooperatively used to access a collection XML documents. According to an embodiment of the present invention, a logical index includes a path table, which contains information about the hierarchies of nodes in a collection XML documents and may contain the value of the nodes. Among the columns or attributes of the path table that serve as keys is a column that stores path representations of the path of the nodes. For a particular XML document, the path table may contain multiple rows, each row holding a particular node value (e.g. element) of the XML document, and a path representation of the node's path within the XML document, among other information.
Partitioning Tables and Indexes
Table partitioning allows a table to be divided in subtables referred to as partitions. Tables are partitioned according to partition criteria. For example, data for sales transactions represented by rows in a table may be partitioned by a column representing the month the sales, or by the region in which in which the sale was made. A column upon which partitioning is based is referred to as a partition key. Because partitions are based on a key that has logical significance, breaking up large tables into smaller sub-tables makes the data set more manageable.
Like an index's respective base table, an index may be partitioned. A partitioned index may be a global partitioned index or a local partitioned index. A global partitioned index is an index which is not partitioned based on the same key as the table indexed by the index. A local partitioned index is an index which is partitioned on the same key on which the table is partitioned.
When an XML index is locally partitioned, all rows in the path table of the XML index corresponding to a row in the base table are in the same partition. Further, if two rows in the base table are in different partitions, then their corresponding rows in the path table are in different partitions.
Bulk Changes to Partitions
In a database system, the problem of loading large amounts of data into a table that is being used by running applications can be facilitated using techniques that involve table partitioning. For example, when a large number of documents are to be loaded into a table, a new partition may be created, and the new data may be inserted into this partition. Either single row insertions or bulk load using a utility can be used such as SQLLOADER from Oracle Corporation.
When indexes are present, the indexes need to be updated after inserting data into the table that is indexed. The updates may be performed synchronously. However, this approach prolongs the amount of time to update the partition, particularly for complex indexes such as the XML index. An approach that addresses this issue involves disabling the index during bulk load and asynchronously updating the index after the upload.
Asynchronously Maintained Index
In contrast to a synchronously maintained index, which by definition results in the index update and the indexed table update occurring as an atomic unit, with an asynchronously maintained index, the insertions into the indexed table are not immediately propagated to the index. Instead, an explicit synchronization (“sync”) operation must be performed for the index to populate table entries corresponding to the new rows in the indexed table. A sync operation results in updating the index entries to accurately reflect the data in the rows of the indexed table.
Asynchronously updating an index for a bulk load has the advantage of higher throughput. However, the asynchronously updated index may be unusable until it is synchronized. As a result, a query optimizer may not use the index in execution plans it generates, leading to use of less efficient execution plans.
Rather than disable an index, another approach, the flashback rewrite approach, rewrites queries that access the base table as flashback queries. A flashback query is a query that returns data that is consistent with a past state of a database. A flashback query is associated with a flashback time. The data returned is consistent with a database state associated with the flashback time. An example of a flashback time is system change number (SCN).
Under the flashback rewrite approach, the logical time an index is disabled is tracked. Queries are rewritten as flashback queries, where the flashback time is set to a time at or earlier than the time tracked for disabling the index.
An advantage to the flashback rewrite approach is that the index is not disabled and can be used to execute queries. A disadvantage is that existing database applications see stale results. Another disadvantage is that flashback queries incur more overhead to execute. Since more undo log records need to be consulted, flashback queries are slower than non-flashback queries.