The present invention is related to the field of information retrieval and data integration and, more particularly, to methods and apparatus for query and access of data from various data sources as integrated XML documents, for guaranteeing that the query outputs conform to the DTD of users"" choice, and for generating XML documents based on the combination of different XML queries.
The eXtensible Markup Language (XML) is emerging as one of the most important formats for document and data representation and transmission. For example, business documents can be presented by XML for Internet transmission and World Wide Web access. More and more users and new applications are starting to require their input and output to be in XML format.
Details of XML are described in xe2x80x9cExtensible Markup Language (XML) 1.0,xe2x80x9d W3C Recommendation Feb. 10, 1998, the disclosure of which is incorporated by reference herein. However, the aspects of XML necessary for an understanding of the present invention are provided herein.
For XML documents, there is the concept of a Document Type Definition (DTD). Each DTD describes the structure of a (potentially infinitely large) set of XML documents. An XML document can have an associated DTD or no corresponding DTD at all. When an XML document is associated with a DTD, its structure must conform to the specification of the DTD. An XML document is xe2x80x9cwell-formedxe2x80x9d if it is grammatically correct and the tags are properly nested. An XML document is xe2x80x9cvalidxe2x80x9d if it conforms to a specific DTD.
XML query languages, e.g., XML-QL and XQL, enable users to ask questions of XML documents and usually return the answers also in the form of XML documents. XML addressing mechanisms, e.g., XPath, identify elements inside XML documents. For ease of discussion, we will refer to all of them as XML query languages.
Current XML query mechanisms generally contain the following logical steps:
1. Query scope identification: usually one or more XML documents or one or more XML elements within some document(s) are identified as being within the query scope.
2. Filtering: select data items to be used as result from the query scope.
3. Output construction: converting and constructing the selected data items into some desirable output format and structure.
An XML document can be modeled as a tree, see xe2x80x9cDocument Object Model (DOM) Level 1 Specification, version 1.0,xe2x80x9d W3C Recommendation Oct. 1, 1998, the disclosure of which is incorporated by reference herein. The filtering step of XML query languages usually either identifies lists of data or elements (scalar-based filtering), or lists of subtrees (subtree-based filtering). The query language may provide a construction mechanism and convert the lists into an XML document in the output construction step. For the case of subtrees, each subtree is rooted at some selected element in the query scope.
The construction step of current query mechanisms is highly unstructured and ad hoc. Some query languages, e.g., XML-QL, allow the above-described three steps to be nested or recursively mixed. Most construction steps are XML instance-based instead of DTD-based. That means the outputs of such queries are well-formed, but not necessarily valid (conforming to a DTD) XML documents.
In many situations, it is necessary to make the output XML document of an XML query conform to a certain DTD. In e-business applications, trading partners may have mutually agreed upon DTDs to which the exchanged XML documents must conform. If the query outputs are valid, they can be used by trading partners immediately. Such task, although possible through ad hoc manipulation of the query output, is inconvenient and hard to guarantee correctness. Furthermore, the XML query languages or expressions proposed so far do not mix among different query languages or expressions. Thus, there is a need for mechanisms and methods that guarantee that the query output conforms to the DTD of the user""s choice.
The present invention provides methods and apparatus that guarantee that the query output conforms to the DTD of the user""s choice. The present invention allows for: (i) selection of a DTD; (ii) integration of one or more XML queries with the DTD; and (iii) in accordance with the provided algorithm, automatic generation of a valid output XML document conforming to the DTD, using the data selected by the XML queries as content of the XML document.
In one aspect of the present invention, a method of processing one or more Extensible Markup Language (XML) queries comprises the steps of: (i) generating a mapping construct which maps a predetermined document type definition (DTD) to one or more data sources to be accessed in response to the one or more XML queries, the mapping construct including a binding specification wherein the one or more XML queries are bound to one or more binding variables; (ii) evaluating the one or more XML queries in accordance with the binding specification of the mapping construct and assigning the evaluation result to the one or more binding variables; and (iii) generating an XML document resulting from the query evaluation, wherein the resulting XML document conforms to the predetermined DTD. It is to be appreciated that the one or more XML queries may be written in one or more XML query languages. Also, the resulting XML document may be a combination of more than one XML query associated with one or more query languages. The resulting XML document may also be a combination of one or more XML queries and one or more non-XML queries. Further, the DTD is preferably specified by a user.
The mapping construct generation step may comprise the steps of: (i) determining suitable DTD constructs; (ii) binding the constructs to variables; (iii) associating the variables with a partial XML result obtained from scoping and filtering stages of the XML query; and (iv) distributing the variables to suitable DTD constructs with value functions. Further, the method may comprise the step of accepting scalar-based results from the scoping and filtering stages. Also, subtree-based results may be accepted from the scoping and filtering stages. The method may also comprise the step of allowing the bindings to be used as parameters in at least one of value generation functions and other binding functions. Further, the method may comprise the step of resolving a nested or recursive filtering query construct with sequential cascade binding constructs.
The query evaluation step may comprise the step of combining different parsing and evaluating mechanisms for evaluating XML queries from different XML query languages. The method may also comprise the step of allowing binding variables to be used in one or more XML queries of different query languages. Further, one or more XML queries of different query languages may be evaluated with binding variables as parameters.
The resulting XML document generation step may comprise the steps of: (i) recursively traversing DTD constructs from a root element; and (ii) associating with binding variables after one of resolving binding functions and evaluating XML queries, until reaching a leaf construct, where a partial XML result is obtained by evaluating associated value functions. The resulting XML document may be composed during a traversal returning stage by adding XML tags enclosing the partial XML result.
As will be explained below, the methodology of the invention preferably makes use of the DTD Source Annotation (DTDSA) method described in U.S. Ser. No. 09/466,627 filed on Dec. 17, 1999 and entitled xe2x80x9cMethod and Apparatus for Converting Between Data Sets and XML Documents,xe2x80x9d the disclosure of which is incorporated by reference herein. However, it is to be understood that the invention is not limited to the DTDSA mechanism. That is, other mechanisms or methods can be used. By way of example only, the IBM DB2 extender (IBM Corporation of Armonk, N.Y.), which saves the mapping information connecting DTD and a DB2 database in a separate file, may be employed.
Many advantages may be realized in accordance with such an inventive universal output constructor for XML queries. The following are some examples of these advantages. The present invention allows the user to choose an arbitrary DTD and to present the query output using that DTD. The present invention works with XPath or any other XML query language that the user chooses as the query pattern matching mechanism. The present invention allows multiple queries of different or same query languages or expressions to be naturally integrated and produce a single output XML document. The present invention allows the aforementioned queries to be correlated. The present invention allows XML queries to be integrated with other data query or access mechanisms (e.g., Structured Query Language or SQL) to produce a single XML output document. The present invention allows the aforementioned XML queries and other data queries or access commands to be correlated.