The present invention relates to the field of data exchange between applications on the Internet. More particularly, the present invention relates to converting relational data into XML (eXtensible Markup Language).
XML (eXtensible Markup Language) can serve many purposes. XML is a more expressive markup language than HTML (Hyper-Text Markup Language). XML may be an object-serialization format for distributed object applications. XML serves as the standard format for data exchange between inter-enterprise applications on the Internet. In data exchange, XML documents are generated from persistent data and then sent over a network to an application. To facilitate data exchange, numerous industry groups, such as healthcare and telecommunications groups, have been defining public document type definitions (DTDs) that specify the format of the XML data to be exchanged between their applications. The aim is to use XML as a xe2x80x9clingua francaxe2x80x9d for data exchange between inter-enterprise applications. XML could make it possible for data to be exchanged regardless of the platform on which it is stored or the data model in which it is represented.
Most data is stored in relational or object-relational database management systems (RDBMS) or in legacy formats. To realize the full potential of XML, tools are needed that can automatically convert the vast stores of relational data into XML. Such tools should be general, dynamic, and efficient.
Relational data is tabular, flat, normalized, and its schema is proprietary, which makes it unsuitable for direct exchange. In contrast, XML data is nested and un-normalized, and its DTD is public. Thus, the mapping from relational data to an XML view is often complex, and a conversion tool must be general enough to express complex mappings. Existing commercial systems fail to be general, because they map each relational database schema into a fixed, canonical DTD. This approach is limited, because no public DTD will match exactly a proprietary relational schema. In addition, it is often desirable to map one relational source into multiple XML documents, each of which conforms to a different DTD. Hence a second step is required to transform the data from its canonical form in XML into its final XML form.
Also, the tools must be dynamic, i.e., only the fragment of the XML document needed by the application should be materialized. In database terminology, the XML view must be virtual. The application typically specifies in a query what data item(s) it needs from the XML document. Typically, these items are a small fraction of the entire data. Some commercial products allow users to export relational data into XML by writing scripts. However, these tools are not dynamic. Rather, they are general because the entire document is generated all at once.
Finally, to be efficient, such tools must exploit fully the underlying query engine of RDBMS whenever data items in the XML view need to be materialized. Query processors for ative XML data are still immature and do not have the performance of highly optimized DBMS engines.
Several commercial tools for exporting relational data into XML views exist today. The ODBC2XML, a product of Intelligent Systems Research (www.intsysr.com) tool allows users to define XML documents with embedded SQL statements, which permit the users to construct an XML view of the relational data. Such views are materialized, however, and cannot be further queried with an XML query language. Alternatively, Oracle""s XSQL tool defines a fixed, canonical mapping of the relational data into an XML document, by mapping each relation and attribute name to an XML tag and tuples as nested elements. Such a view could be kept virtual, but this approach is not general enough to support mapping into an arbitrary XML format. IBM""s DB2, XML Extender provides a Data Access Definition (DAD) language that supports both composition of relational data in XML and decomposition of XML data into relational tables. DAD""s composition feature supports generation of arbitrary XML from relational data. However, the criteria for grouping elements is implicit in the DAD and DAD specifications cannot be nested arbitrarily. More significantly, XML Extender does not support query composition.
The present invention overcomes many of the shortcomings of the prior art. In addition, the present invention addresses the problem of automating the conversion of relational data into XML. According to the invention, a general, dynamic, and efficient tool for viewing and querying relational data in XML referred to as SilkRoute is provided. SilkRoute is general, because it can express mappings of relational data into XML that conform to arbitrary DTDs, not just a canonical mapping of the relational schema. The mappings may be referred to as views. Applications can express the data they need as an XML-QL query over the view. SilkRoute is dynamic, because it can materialize the fragment of an XML view needed by an application, and Silkroute is efficient, because it can fully exploit the underlying RDBMS (Relational DataBase Management Systems) query engine whenever data items in an XML view need to be materialized.
According to one aspect of the present invention a general framework is provided for mapping relational databases to XML views, to be used in data exchange. In another aspect of the invention, a new query language, RXL, for mapping relational sources to XML views, is provided. According to yet another aspect, the present invention provides a sound and complete query composition algorithm that, when given an RXL query and an XML-QL query, generates a new RXL query equivalent to their composition. In a still further aspect of the present invention, a technique is provided in which most of the work of an RXL query can be shipped to the underlying database engine.
Although the invention has been defined using the appended claims, these claims are exemplary and limiting to the extent that the invention is meant to include one or more elements from the apparatuses described herein in any combination or sub-combination. Accordingly, there are any number of alternative combinations for defining the invention that incorporate one or more elements from the specification (including drawings, claims, etc.) in any combinations or sub-combinations.