The present invention relates to the field of data exchange between applications on a distributed network. More particularly, the present invention relates to converting relational data into XML (eXtensible Markup Language) on the Internet.
XML (eXtensible Markup Language) can serve many purposes. XML is a more expressive markup language than HTML (Hyper-Text Markup Language). XML may be an object-serialization format for distributed object applications. XML serves as the standard format for data exchange between inter-enterprise applications on the Internet and in particular, the World Wide Web (xe2x80x9cWebxe2x80x9d). In data exchange, XML documents are generated from persistent data and then sent over a network to an application. To facilitate data exchange, numerous industry groups, such as healthcare and telecommunications groups, have been defining public document type definitions (DTDs) and XML Schemas (generically, XML schemas) that specify the format of the XML data to be exchanged between their applications. The aim is to use XML as a xe2x80x9clingua francaxe2x80x9d for data exchange between inter-enterprise applications. XML can make it possible for data to be exchanged regardless of the platform on which it is stored or the data model in which it is represented. When received by a target application, XML data can be re-mapped into the application""s data structure or target database system. Thus, XML can serve as a language for defining a view of non-XML data.
Most data is stored in relational or object-relational database management systems (RDBMS) or in legacy formats. To realize the full potential of XML, tools are needed that can automatically convert the vast stores of relational data into XML. Such tools should be general, dynamic, and efficient.
Relational data is tabular, flat, normalized, and its schema is proprietary, which makes it unsuitable for direct exchange. In contrast, XML data is nested and un-normalized, and its XML schema is public. Thus, the mapping from relational data to an XML view is often complex, and a conversion tool should be general enough to express complex mappings. Existing commercial systems are not general, because they map each relational database schema into a fixed, canonical XML schema. This approach is limited, because no public XML schema will match exactly a proprietary relational schema. In addition, it is often desirable to map one relational source into multiple XML documents, each of which conforms to a different DTD. Hence, a second step is required to transform the data from its canonical form in XML into its final XML form.
Also, the tools need to be dynamic, i.e., only the fragment of the XML document needed by the application should be materialized. In database terminology, the XML view should be virtual. The application typically specifies in a query what data item(s) it needs from the XML document. Typically, these items are a small fraction of the entire data. Some commercial products allow users to export relational data into XML by writing scripts. However, these tools are not dynamic. Rather, they are general because the entire document is generated all at once.
Finally, to be efficient, such tools should exploit fully the underlying query engine of RDBMS whenever data items in the XML view need to be materialized. Query processors for native XML data are still immature and do not have the performance of highly optimized RDBMS engines.
Several commercial tools for exporting relational data into XML views exist today. The ODBC2XML, a product of Intelligent Systems Research (www.intsysr.com) tool allows users to define XML documents with embedded SQL statements, which permit the users to construct an XML view of the relational data. Such views are materialized, however, and cannot be further queried with an XML query language. Alternatively, Oracle""s XSQL tool defines a fixed, canonical mapping of the relational data into an XML document, by mapping each relation and attribute name to an XML tag and tuples as nested elements. Such a view could be kept virtual, but this approach is not general enough to support mapping into an arbitrary XML format. IBM""s DB2 XML Extender provides a Data Access Definition (DAD) language that supports both composition of relational data in XML and decomposition of XML data into relational tables. DAD""s composition feature supports generation of arbitrary XML from relational data. However, the criteria for grouping elements is implicit in the DAD and DAD specifications cannot be nested arbitrarily. More significantly, XML Extender does not support query composition. The Microsoft SQL Server 2000 provides four modes for exporting relational data in XML. Raw mode exports relational tables using a canonical mapping, similar to the technique used in Oracle. Auto mode derives each element name from the relational table and column names. Directives indicate whether column values should appear in XML attributes or elements. In explicit mode, the user constructs a tagged, universal relation that contains the content for the entire document. Each tuple in the result relation is tagged with integers that specify the appropriate nesting level. Explicit mode is completely general and efficient, but it requires the user to construct the universal relation by hand. SQL Server also supports xe2x80x9cXML viewsxe2x80x9d, which is a technique similar to DAD""s RDB mode, but which is dynamic. The elements and attributes in XML templates are annotated with the names of the relational values from which they are derived. The technique is not wholly general, because it does not support arbitrary join conditions in the definition of elements. SQL Server""s XML views do qualify as dynamic, because they permit querying of the XML view using XPath. As a user-query language, Xpath supports selection of elements, but not projection or restructuring as does XML-QL.
The present invention overcomes many of the shortcomings of the prior art. In addition, the present invention addresses the problem of automating the conversion of relational data into XML. According to the invention, a general, dynamic, and efficient tool for viewing and querying relational data in XML referred to as SilkRoute is provided. SilkRoute is general, because it can express mappings of relational data into XML that conform to arbitrary XML schemas, not just a canonical mapping of the relational schema. The mappings may be referred to as views. Applications can express the data they need as an XML-QL query over the view. SilkRoute is dynamic, because it can materialize the fragment of an XML view needed by an application, and Silkroute is efficient, because it can fully exploit the underlying RDBMS (Relational DataBase Management Systems) query engine whenever data items in an XML view need to be materialized.
According to one aspect of the present invention, a general framework is provided for mapping relational databases to XML views, to be used in data exchange. In another aspect of the invention, a new query language, RXL, for mapping relational sources to XML views, is provided. According to yet another aspect, the present invention provides a sound and complete query composition algorithm that, when given an RXL query and an XML-QL query, generates a new RXL query equivalent to their composition. In a still further aspect of the present invention, a technique is provided in which most of the work of an RXL query can be shipped to the underlying database engine.
In another aspect of the invention, an algorithm is provided for efficiently constructing materialized XML views of relational databases. In another aspect of the invention, an XML view can be specified by a query in a declarative query language of a middleware system. According to another aspect of the present invention, an algorithm designed for XML view-definition queries is provided for decomposing a large query into smaller queries. According to a further aspect of the invention, a middleware system can evaluate a query by sending one or more SQL queries to a target relational database, integrating the resulting tuple streams, and adding XML tags. In still a further aspect of the invention, a view-definition query algorithm of the present invention may be implemented in RDBMS engines that generate XML internally.
A query language according to another aspect of the present invention, can be adapted for operation with a variety of systems. For example, the query language can express the transformations expressible in existing XML publishing tools, such as those provided by relational database systems. For example, the IBM DB2 XML Extender provides a Data Access Definition (DAD) language, Microsoft SQL Server has an XML view-definition module, and the Oracle XML SQL Utility exports relational data in a fixed, canonical XML view. In another aspect of the present invention, an intermediate representation of XML view queries called a view tree has been created that is general enough to express the XML mappings in any of these systems. An illustrative algorithm of the invention takes a view tree as input, and therefore could be directly applied to the XML view definitions expressed by these tools.
Although the invention has been defined using the appended claims, these claims are exemplary and limiting to the extent that the invention is meant to include one or more elements from the apparatuses described herein in any combination or sub-combination. Accordingly, there are any number of alternative combinations for defining the invention that incorporate one or more elements from the specification (including drawings, claims, etc.) in any combinations or sub-combinations.