Many database systems now support storage and querying of eXtensible Markup Language data (“XML data”). For example, a collection of XML documents can be stored in a shredded form in a database system. In this form, base structures in the database system can be defined in such a way as to capture hierarchical relationships among nodes in a XML document. Under an approach using the shredded form, when an XML document is submitted to the database system for storage, it is shredded into node values. The node values shredded are then stored in their respective columns in the base structures.
A disadvantage of this approach is that it requires a priori (e.g., prior to compilation of a query) knowledge about a XML schema that describes the collection of XML documents. If, however, the XML schema did not exist or were unknown, the database system would not be able to define properly a set of base structures that would reflect all data types and structural relationships the collection of XML documents may embody.
Even if the schema did exist and were known, still not all the data types or structural relationships in the collection of XML documents would be useful for many queries. For example, in cases where a user is only interested in a limited number of nodes in the collection of XML documents, a corresponding XML schema probably would define too many extraneous data types and too many complex, extraneous structural relationships about which very few queries care. In addition, shredding a collection of XML documents and storing resultant node values entails wasted efforts with little advantage in return.
Alternatively, the collection of XML documents may be stored in an aggregate form in a database system. In the aggregate form, XML documents are stored as CLOB (Character Large Object) or BLOB (Binary Large Object). This way, when storing XML documents, the database system does not have to shred them into node values. Also, under this aggregate approach, no prior knowledge of XML schema is required.
Alternatively, the collection of XML documents can also be stored in tree form in a database system.
However, a disadvantage of storing XML documents in the aggregate or tree form is that ad-hoc mechanisms may have to be used to satisfy XPath-based queries. In fact, without a suitable indexing mechanism on the collection of XML documents, in order to satisfy an XPath-based query, a database system has to perform a full scan of all XML documents. While a full scan of all XML documents could certainly be used to satisfy all XPath queries, in terms of processing time, the queries would be quite slow.
XML Table Indexes can speed up XPath types of queries. Under this approach, a query accesses an XML Table Index associated with a collection of XML documents, rather than the collection directly. Notably, an XML Table Index is logically a table, separate from the base structures storing the collection of XML documents, and indexes the collection of XML documents. The XML Table Index table includes a plurality of columns that corresponds to a plurality of nodes in the XML documents. Node values associated with the plurality of nodes are stored in the plurality of columns. Preferably, columns in the table of XML Table Index are ones commonly associated with XPath-based queries. That way, many if not all XPath-based queries can be answered by looking at values contained in the plurality of columns in the table of XML Table Index, instead of directly accessing the base structures that stores XML documents.
A disadvantage is that columns of complex data types are not handled in a useful way and queries involving relatively complex data types cannot be efficiently processed. For example, a collection of XML documents may involve nodes that store various XML embedded with domain specific data, such as text data, image data, audio data and other opaque data; merely storing raw values of those nodes are not useful because not many queries can be formulated in terms of raw values for complex data types.
Because of these limitations, the existing techniques are not as efficient in accessing XML documents in a database system as would be desired. As a result, a better mechanism that would improve accessing XML documents in a database system is needed.