XML databases are well known in the art. For XML databases, XPath is a language for accessing XML documents in the database. Its value indexes are of particular interest since they can be used to answer a large set of XPath queries efficiently. Typically, XML documents are stored according to a tree data model, such as XQuery data model or Document Object Model (DOM). The nodes of the data tree are streamed and scanned. The XPath is then evaluated and a result which satisfies the XPath query is returned.
Conventional approaches of processing XPath queries on XML data streams however have the following problems:
They process only one XPath expression at a time. At the database system level, an XML document may have many value indexes, each of them corresponding to an XPath expression. Conventional approaches require multiple scans of the XML document to build the value indexes, which is not efficient.
They explicitly express all the possible matching paths for an input XML node in their state machines or working buffers, which is not efficient since the number of matching paths can be very large in some situation.
*They passively process every input node or event and do not or cannot skip uninterested XML sub-trees.
*For every XML node, they expect two events, OPEN and CLOSE. While this assumption is reasonable when an XML document is stored as character large objects (LOB), composing the two events for an XML node is expensive when an XML document is stored into records.
Accordingly, there exists a need for an improved method for generating hierarchical path value index keys. The method should process XPath expressions efficiently and require one scan of an XML document, both for single and multiple indexes. The present invention addresses such a need.