The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
The Extensible Markup Language (XML) is a widely accepted standard for data and documents in the computer industry. XML describes and provides structure to a body of data, such as a file or data packet. The XML standard provides for tags that delimit sections of a XML entity referred to as XML elements. The following XML document A illustrates the components of an XML document.
XML document A<a c=“foo”> <b>3</b> <d>10</d></a>
XML elements are delimited by a start tag and a corresponding end tag. For example, segment A contains the start tag <b> and the end tag </b> to delimit an element. The data between the elements is referred to as the element's content. The name of the element delimited by <b> and the end tag </b> is b and is thus referred to herein as element b or just b.
An element's content may include the elements value, one or more attributes, and one or more elements. Element a contains two elements b and d. An element that is contained by another element is referred to as a descendant of that element. Thus, elements b and d are descendants of element a. An element's attributes are also referred to as being contained by the element.
Database servers that store XML documents perform various XML related operations on the XML documents using XML query languages, such as XQuery/XPath. XML Query Language (“XQuery”) and XML Path Language (“XPath”) are important standards for a query language, which can be used in conjunction with SQL to express a large variety of useful queries. XPath is described in XML Path Language (XPath), version 1.0 (W3C Recommendation 16 Nov. 1999), which is incorporated herein by reference.
One benefit of storing XML documents in a database system is that XML allows multiple applications to perform operations using the same XML documents. This requires that the XML data be general enough to be understood by all applications that share XML documents. However, in many cases, there is a need to include application specific information in the XML document. Application specific information is data contained within the XML document that is only used, needed, and/or recognized by less than all applications for which the XML document is being maintained or made accessible.
Including application specific information in a shared XML document poses a significant problem because multiple applications are using the same XML document and not all of the applications can identify, handle, and recognize the application specific information. For purposes of explanation consider the following XML document:
Article1.xml<Article xmlns= “http://www.mycompany.com”Xmlns:fmt=http://www.mycompany.com/format”> <fmt:justified>  <Date> January 01, 2001 </Date>  <Title> My title </Title>  <Author> John <fmt:italic> Jonathan </fmt:italic>  Doe </Author>  <Text>   ............... <fmt:bold> This is Important   </fmt:bold> ......  </Text> </fmt:justified></Article>
Consider two applications that share the XML document Article1.xml. Application 1 is responsible for displaying the article content and Application 2 is a tool for searching the article content. Application 1 inserted the following formatting information into Article1.xml: <fmt:justified>, <fmt:italic>, and <fmt:bold>. The formatting information is useless to Application 2, which performs search queries on the Article content. When Application 2 requests to return all article titles for articles written by Jonathan, Application 2 may use in an XPath query a path expression like ‘/Articles/Title and/Articles/Author’. The formatting information inserted by Application 1 poses two significant problems for path expression evaluation.
The first problem is that the path expression leading to the Title and Author elements changes with the addition or deletion of formatting information. The second problem occurs when formatting information is introduced into the leaf nodes of the XML document. If formatting information is added into the leaf nodes, they no longer remain leaf nodes and hence, text-search on the value of that node changes. For, example in the above document, a search for articles where author name matches “John Jonathan Doe” will not return the above document due to the formatting information contained in the Author node.
In addition, the problems described above involving path expression also hinders the ability and benefit of creating an index on XML documents that contain application specific information.
Based on the foregoing, there is a clear need to develop approaches for isolating nodes within a shared collection of XML documents and perform path operations with the XML data as if those nodes are not present.