Since XML (eXtensible Markup Language) was recommended by the W3C (World Wide Web Consortium) in 1998, XML has been widely used and has become a standard format for data exchange between computers. XQuery became a W3C recommendation on Jul. 1, 2007, enhancing the XML environment. XML data (XML documents) are documents or data created in a language conforming to XML and are written in such a manner that a tree structure of nodes such as elements and attributes is built.
For example, XML data may be written as follows:
[(XML statement)]1 <books>1.1 <book>1.1.1 <author>author_name1</author>1.1.2 <title>title1</title>1.1.3 <price>100</price> </book>1.2 <book>1.2.1 <author>author_name2</author>1.2.2 <title>title2</title2></book></books>
An element, which is one node, includes a start tag, content, and an end tag. For example, “<price>100</price>” is an element, where “<price>” is the start tag, “100” is the content, and “</price>” is the end tag. An element can include another element. XPath is a language syntax indicating a specific component or components of XML data. XPath is an important constitutional part in operations on XML data, such as XQuery, XSLT (XSL transformations: a standard for data transformation from one XML file to another). XPath expressions are actual expressions written in accordance with XPath specifications. For example, “//books//book” and “//books/book” are XPath expressions. In an XPath expression, the double slash “//” indicates a “descendant” in a parent-child relationship between the elements in a tree structure of XML data and a slash “/” indicates a “child”. The numbers in the leftmost part of the XML data given above are identifies (node IDs, or nIDs) of the element nodes of the XML data. The numbers are added for illustration purposes and are not included in actual XML data. In this example, numbers representing the order of siblings are added to the identifiers of parent elements with “.” to create the identifiers of child elements, thereby enabling ancestor-descendant relationships to be identified based on the identifiers.
There are a number of known methods for extracting XML nodes identified by given XPath expressions from XML data. Examples include a method in which an XML data tree is searched, a method that uses structural joins as described in documents such as “Structural Joins: A Primitive for Efficient XML Query Pattern Matching” (N. Koudas J. M. Patel S. Al-Khalifa, H. V. Jagadish, D. Srivastava, and Yuqing Wu, in ICDE, 2002), and a (hybrid) method combining these methods, such as a method described in “Fast XPath processing with XML Summaries” (Takeharu Eda, Makoto Onizuka, and Masashi Yamamuro, The Journal of the Institute of Electronics, Information and Communication Engineers D, 2006, Vol. J89-D, pp. 139-150).
Any of these methods can be used to extract a set of the identifiers of element nodes {1.1, 1.2} if an XPath expression, for example “//books//book”, is given for the XML data.
For example, in the method in which an XML data tree is searched, the nodes of an XML data tree are traversed to search for a structure that matches the pattern of the XPath expression “//books//book”. In doing this, an automaton generated from the XPath expression is used to traverse the nodes of the XML data being searched to find a target node. As a result, a set of identifiers {1.1, 1.2} can be acquired. In the method that uses structural joins, element nodes of XML data are acquired and ancestor-descendant relationships between the element nodes are determined by using labels assigned to the element nodes. That is, the XPath expression (“//books//book”) is decomposed into the pattern “//books” and the pattern “//book”, an identifier set {1} for the same pattern “//books” is obtained, and an identifier set {1.1, 1.2} is acquired for the same pattern “//book”. Then, identifier sets that are in an ancestor-descendant relationship are searched for among these identifier sets. Since there are parent-child relationships “1→1.1” and “1→1.2” in this example, an identifier set {1.1, 1.2} can be acquired. In the hybrid method which is a combination of these methods, structural joins are performed only on predicates, thereby reducing the number of structural joins.
However, the method in which an XML data tree is searched involves searching all branches and therefore does not have scalability according to the amount of XML data. That is, processing time increases at a geometric rate as the number of nodes making up XML data increases. In the method that uses structural joins, the number of elements of sets of node identifies increases as the amount of XML data increases. Accordingly, the time required for determining ancestor-descendant relationships between all the elements increases geometrically. The hybrid method that combines these methods reduces the number of structural joins by performing structural joins only on predicates and provides an effect in a way different from the present invention.