Computers are very powerful tools for storing and providing access to vast amounts of information. Computer databases are a common mechanism for storing information on computer systems while providing easy access to users. A typical database is an organized collection of related information stored as “records” having “fields” of information. As an example, a database of employees may have a record for each employee where each record contains fields designating specifics about the employee, such as name, home address, salary, and the like.
Between the actual physical database itself (i.e., the data actually stored on a storage device) and the users of the system, a database management system or DBMS is typically provided as a software cushion or layer. In essence, the DBMS shields the database user from knowing or even caring about the underlying hardware-level details. Typically, all requests from users for access to the data are processed by the DBMS. For example, information may be added or removed from data files, information retrieved from or updated in such files, and so forth, all without user knowledge of the underlying system implementation. In this manner, the DBMS provides users with a conceptual view of the database that is removed from the hardware level. The general construction and operation of database management systems is well known in the art. See e.g., Date, C., “An Introduction to Database Systems, Seventh Edition”, Part I (especially, Chapters 1-4), Addison Wesley, 2000.
In recent years, applications running on database systems frequently provide for business-to-business or business-to-consumer interaction via the Internet between the organization hosting the application and its business partners and customers. Today, many organizations receive and transmit considerable quantities of information to business partners and customers through the Internet. A considerable portion of the information received or exchanged is in Extensible Markup Language or “XML” format. XML is a pared-down version of SGML (Standard Generalized Markup Language), designed especially for Web documents, which allows designers to create their own customized tags, enabling the definition, transmission, validation, and interpretation of data between applications and between organizations. For further description of XML, see e.g., “Extensible Markup Language (XML) 1.0” (Second Edition, Oct. 6, 2000) a recommended specification from the W3C, the disclosure of which is hereby incorporated by reference. A copy of this specification is available via the Internet (e.g., currently at www.w3.org/TR/2000/REC-xml-20001006). Many organizations utilize XML to exchange data with other remote users over the Internet.
Given the increasing use of XML in recent years, many organizations now have considerable quantities of data in XML format, including Web documents, newspaper articles, product catalogs, purchase orders, invoices, and product plans. As a result, these organizations need to be able to efficiently store, maintain, and use this XML information in an efficient manner. However, this XML data is not in a format that can be easily stored and searched in current database systems. Most XML data is sent and stored in plain text format. This data is not formatted in tables and rows like information stored in a relational DBMS. To search this semi-structured data, users typically utilize keyword searches similar to those utilized by many current Internet search engines. These keyword searches are resource-intensive and are not as efficient as relational DBMS searches of structured data.
Organizations with data in XML format also typically have other enterprise data stored in a structured format in database management systems. Increasingly, database system users are demanding that database systems provide the ability to access and use both structured data stored in these databases, as well as XML and other unstructured or semi-structured data. In addition, users desire flexible tools and facilities for performing searches of this data.
One of the key roles of a database management system (DBMS) is to retrieve data stored in a database based on specified selection criterion. This typically involves retrieving data in response to a query that is specified in a query language. One current solution used in XML-based applications to query the contents of an XML document is XPath. XPath provides basic facilities for manipulation of strings, numbers and booleans. It uses a compact, non-XML syntax to facilitate use of XPath within URIs and XML attribute values. XPath operates on the abstract, logical structure of an XML document, rather than its surface syntax. XPath gets its name from its use of a path notation as in URLs for navigating through the hierarchical structure of an XML document. For further description of XPath, see e.g., “XML Path Language (XPath) Version 2.0” (Jan. 23, 2007), a recommended specification from the W3C, the disclosure of which is hereby incorporated by reference. A copy of this specification is available via the Internet (e.g., currently at http://www.w3.org/TR/XPath20/).
The XPath query language is commonly used in Extensible Stylesheet Language Transformations (XSLT) to locate and to apply XSLT templates to specific nodes in an XML document. In general, an XPath expression specifies a pattern that selects a set of XML nodes. Thus, XPath queries are commonly used to locate and to process nodes in an XML document that match a specified criteria.
For example, a simple XPath query may take a form such as /A/B/C to select C elements that are children of B elements that are children of the A element that forms the outermost element of the XML document. Selection may take on a more complex form, however, with construction of complex XPath expressions. More complex XPath expressions can be constructed, such as by containing other XPath query language constructs, e.g., filter, functions, parenthesis, union, intersection, etc., specifying an axis other than the default ‘child’ axis, a node test other than a simple name, or predicates. For example, the complex XPath expression A//B/*[1] should return the first element (as designated by the use of ‘[1]’), with any name (as designated by the use of ‘*’), that is a child (‘/’) of a B element that itself is a child or other deeper descendant (‘//’) of an A element that is a child of the current context node (the expression does not begin with a ‘/’). When there are several suitable B elements in the document, a set of all their first children needs to be returned.
While XPath has been used as the query language for XML documents with some success, complex XPath querying is not handled effectively in current XML processing engines. One particular need is for a solution that will enable efficient and accurate searches of information in XML documents when queried using complex expression. The present invention addresses this need.