In recent years, database systems that allow storage and querying of eXtensible Markup Language data (“XML data”) have been developed. Though there are many evolving standards for querying XML, all of them include some variation of XPath. XPath is a language that describes a way to locate and process items in XML documents by using an addressing syntax based on a path through the document's logical structure or hierarchy. The portion of an XML document identified by an XPath “path expression” is the portion that resides, within the structure of the XML document, at the end of any path that matches the path expression.
XML documents that are managed by a relational database server are typically stored as unstructured serialized data in some form of a LOB (Large Object) datatype. For example, an XML document may be stored in unstructured storage, such as a CLOB (Character LOB) or a BLOB (Binary LOB), or the document may be stored as an O-R (object relational structure that uses an XML schema).
No matter how the XML document is stored, in order to fulfill many XPath queries, a method of identifying and extracting a fragment of a stored XML document matching an XPath path expression is needed.
Unfortunately, even database systems that have built-in support for storing XML data are usually not optimized for handle path-based queries, and the query performance of the databases systems leaves much to be desired. In specific cases where an XML schema definition may be available, the structure and data types used in XML instance documents may be used to optimize XPath queries. However, in cases where an XML schema definition is not available, and the documents to be searched do not conform to any schema, there are no efficient techniques for path-based querying.
Ad-hoc mechanisms, like a full scan of all documents, or text keyword-based indexes, may be used to increase the performance of querying documents when no XML schema definition is available. However, these mechanisms do not fulfill the need for an efficient method of quickly identifying and extracting a fragment of a stored XML document that matches an XPath path expression.
Even if a method of quickly identifying a location for a fragment of stored XML data were available, a method of efficiently extracting the fragment from the identified location is still needed. The fragment, as it exists at the identified location, may not be a valid, self-contained XML document. For example, namespace prefixes used within a fragment may be declared outside of that fragment, and therefore the fragment retrieved from the identified location will not have all the needed declarations.
Based on the foregoing, there is a clear need for a system and method for identifying and extracting valid, self-contained XML fragments that match an XPath path expression.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.