1. Field of the Invention
The present invention relates generally to data processing environments and, more particularly, to a database system providing methodology for execution of functions in queries requesting data from markup language documents.
2. Description of the Background Art
Computers are very powerful tools for storing and providing access to vast amounts of information. Computer databases are a common mechanism for storing information on computer systems while providing easy access to users. A typical database is an organized collection of related information stored as “records” having “fields” of information. As an example, a database of employees may have a record for each employee where each record contains fields designating specifics about the employee, such as name, home address, salary, and the like.
Between the actual physical database itself (i.e., the data actually stored on a storage device) and the users of the system, a database management system or DBMS is typically provided as a software cushion or layer. In essence, the DBMS shields the database user from knowing or even caring about the underlying hardware-level details. Typically, all requests from users for access to the data are processed by the DBMS. For example, information may be added or removed from data files, information retrieved from or updated in such files, and so forth, all without user knowledge of the underlying system implementation. In this manner, the DBMS provides users with a conceptual view of the database that is removed from the hardware level. The general construction and operation of database management systems is well known in the art. See e.g., Date, C., “An Introduction to Database Systems, Seventh Edition”, Part I (especially, Chapters 1-4), Addison Wesley, 2000.
In recent years, applications running on database systems frequently provide for business-to-business or business-to-consumer interaction via the Internet between the organization hosting the application and its business partners and customers. Today, many organizations receive and transmit considerable quantities of information to business partners and customers through the Internet. A considerable portion of the information received or exchanged is in Extensible Markup Language or “XML” format. XML is a pared-down version of SGML (Standard Generalized Markup Language), designed especially for Web documents, which allows designers to create their own customized tags, enabling the definition, transmission, validation, and interpretation of data between applications and between organizations. For further description of XML, see e.g., “Extensible Markup Language (XML) 1.0” (Second Edition, Oct. 6, 2000) a recommended specification from the W3C, the disclosure of which is hereby incorporated by reference. A copy of this specification is available via the Internet (e.g., currently at www.w3.org/TR/2000/REC-xml-20001006). Many organizations utilize XML to exchange data with other remote users over the Internet.
Given the increasing use of XML in recent years, many organizations now have considerable quantities of data in XML format, including Web documents, newspaper articles, product catalogs, purchase orders, invoices, and product plans. As a result, these organizations need to be able to efficiently store, maintain, and use this XML information in an efficient manner. However, this XML data is not in a format that can be easily stored and searched in current database systems. Most XML data is sent and stored in plain text format. This data is not formatted in tables and rows like information stored in a relational DBMS. To search this semi-structured data, users typically utilize keyword searches similar to those utilized by many current Internet search engines. These keyword searches are resource-intensive and are not as efficient as relational DBMS searches of structured data.
Organizations with data in XML format also typically have other enterprise data stored in a structured format in database management systems. Increasingly, database system users are demanding that database systems provide the ability to access and use both structured data stored in these databases as well as XML and other unstructured or semi-structured data. In addition, users desire flexible tools and facilities for performing searches of this data.
One of the key roles of a database management system (DBMS) is to retrieve data stored in a database based on specified selection criterion. This typically involves retrieving data in response to a query that is specified in a query language. One particular need is for a solution that will enable efficient searches of information in XML documents. For instance, it would be desirable to have an XML version of SQL (Structured Query Language) that would enable a user to easily retrieve all nodes of type X that have descendants of type Y from an XML document.
One current solution used in XML-based applications to query the contents of an XML document is XPath. The XPath query language is commonly used in Extensible Stylesheet Language Transformations (XSLT) to locate and to apply XSLT templates to specific nodes in an XML document. XPath queries are also commonly used to locate and to process nodes in an XML document that match a specified criteria. XPath provides basic facilities for manipulation of strings, numbers and booleans. It uses a compact, non-XML syntax to facilitate use of XPath within URIs and XML attribute values. XPath operates on the abstract, logical structure of an XML document, rather than its surface syntax. XPath gets its name from its use of a path notation as in URLs for navigating through the hierarchical structure of an XML document. For further description of XPath, see e.g., “XML Path Language (XPath) Version 1.0” (Nov. 16, 1999), a recommended specification from the W3C, the disclosure of which is hereby incorporated by reference. A copy of this specification is available via the Internet (e.g., currently at www.w3c.org/TR/xpath).
Although XPath provides a mechanism for locating nodes in an XML document that match specified criteria, problems remain in the processing of queries written in the XPath query language in current database systems. One particular problem is that data in XML documents is typically spread in various places throughout the document. For example, in an XML document containing records of books in a bookstore, the names of authors in the book will typically not be consolidated in one location, but rather will be spread throughout the document. Accordingly, performing a search to find a particular author name may require traversing paths of the XML document structure to locate nodes containing the author name and then comparing the author name at a given node to the desired valued.
Another problem is that the data in an XML document may be in different forms. For instance, one publisher of books may use all upper-case letters for author names (e.g., “JOHN”), while another publisher uses a “first-letter capitalized” style (e.g., “John”). If a user wants to find all the books in which the author's first name is John, this query needs a union of the results of a first search performed for authors with first-name=“JOHN” and a second search with first-name=“John”. However, a case-free comparison may be possible using string functions such as “tolower( )” or “toupper( )”. The use of functions such as tolower( ) and toupper( ) make it possible to perform string comparisons more efficiently. Recently, efforts have been initiated to provide for use of certain built-in functions within XPath queries. A set of functions for use within path-based queries (e.g., XPath queries) has been proposed in “XQuery 1.0 and XPath 2.0 Functions and Operators”, a World Wide Web Consortium (W3C) Working Draft dated Jul. 23, 2004, the disclosure of which is hereby incorporated by reference. A copy of this document is available via the Internet (e.g., currently at www.w3.org/TR/xpath-functions/). However, database solutions currently do not include mechanisms for using functions such as the above-described string functions in XPath queries. Some current DBMS solutions provide the ability to store XML data in a database system and to retrieve this data using XPath queries. However, these solutions currently do not support inclusion of these types of functions in XPath queries.
What is needed is a database system with built-in support for functions in queries (e.g., XPath queries) requesting data in XML format. The solution should enable a function to be included anywhere within a query expression. The solution should also enable information to be consolidated from various portions of an XML document as a query is executed. The present invention provides a solution for these and other needs.