Internet applications today are faced with the problem of replicating, transforming, exporting, or saving data from one format to another. This process may be laborious, tedious and error prone. The Internet holds within it the potential for integrating all information into a global network, promising access to information any time and anywhere. However, this potential has yet to be realised. At present, the Internet is merely an access medium. To realize the Internet's potential, intelligent search, data exchange, adaptive presentation, and data recovery are needed. The Internet must go beyond setting an information access standard, which means a standard way of representing data, so that software can search, move, display, recover and otherwise manipulate information currently hidden in contextual obscurity.
XML (eXtensible Markup Language) has emerged as the standard for data interchange over the Internet. Interoperation of relational databases and XML databases requires schema translation and data conversion between the relational and XML databases. The translated XML schema may assist in the sharing of business data with other systems, interoperability with incompatible systems, exposing legacy data to applications that use XML such as e-commerce, object persistence using XML, and content syndication. In recent years, with the growing importance of XML documents as a means to represent data on the World Wide Web, much research has been carried out on devising new technologies to store and retrieve XML documents using relational databases.
XML databases are available from the key Relational Database Vendors in the marketplace as an extender or cartridge to a relational database management system. Most XML-enabled database management systems such as Oracle, SQL Server and Sybase can only translate a few relations into an XML document. However, they cannot transform the whole relational database into an XML document nor synchronize a relational database into a replicate XML database.
Moreover, in such conventional systems and methods, the translation is without data semantics constraints considerations, and thus these methods may not be sufficient for an information highway on the web. The demand on the database is increased in e-commerce. Aoying Zhou, Hongjun Lu, Shihui Zheng, Yuqi Liang, Long Zhang, Wenyun Ji, and Zengping Tian describe a visual based XML document management system (a VXMLR system) in the paper entitled ‘A Visual XML-Relational Database System’, published as Proceedings of the 27th VLDB Conference, Roma, Italy, 2001 pp. 646-648. In this system, firstly an XML document is parsed into a Document Object Model (DOM) tree and the Document Type Definition (DTD) of the document is extracted. The Document Object Model tree is then mapped into a relational table and stored in a database. For processing XML queries, the path expressions queries are transformed into SQL statements and submitted to the underlying Relational Database Management Systems (RDBMS). VXMLR maintains some statistics of data and a path directory, which are used in the query rewriting process to reduce the number of SQL statements and simplify join conditions.
Mary Fernandez, Wang-Chiew Tan and Dan Suciu in the document entitled ‘SilkRoute: trading between relations and XML, Computer Networks’, Volume 33, Issues 1-6, June 2000, pp. 723-745 describe a general framework for mapping relational databases to XML virtual views using a declarative query language, RXL (Relational-to-XML Transformation Language). The resultant view is formulated by application using XML-Query Language (QL) to extract XML data.
In a document by Masatoshi Yoshikawa and Toshiyuki Amagasa entitled ‘XRel: A path-based approach to storage and retrieval of XML documents using relational databases’, published as ACM Transactions on Internet Technology, Vol. 1 No. 1, August 2001, pp. 110-141, an XML document is decomposed into a set of nodes that are stored in several tables along with encoded path information from the root to each node. XML documents are stored using a fixed relational schema without any information about DTDs and also utilize indices such as the B1-tree supported by DBMS. To process XML queries, an algorithm is presented for translating a core subset of XPath expressions into SQL queries.
Jayavel Shanmugasundaram, Eugene Shekita, Rinion Barr, Michael Carey, Bruce Lindsay, Hamid Pirahesh, and Berthold Reinwald, in a document entitled ‘Efficiently Publishing Relational Data as XML Documents’, published as Proceedings of the 26th VLDB Conference, Cairo, Egypt, 2000, pp. 65-76, describe an SQL language extension, namely an XML constructor, for constructing complex XML documents directly in the relation engine. Different execution plans for generating the content of an XML document were explored. The result shows that constructing XML documents inside the relation engine could have significant performance benefits.
Joseph Fong, Francis Pang, and Chris Bloor in a document entitled ‘Converting Relational Database into XML Document’, published as Proceedings of First International Workshop on Electronic Business Hubs, September, 2001, pp. 61-65 describe a method to translate XQL into SQL in an XML gateway. The described translation process adopts a symbolic transformation of node navigation in an XQL query graph to a relation join table navigation in an SQL query graph.
Joseph Fong and Tharam Dillon in a document entitled ‘Towards Query Translation from XQL to SQL’, published as Proc. of 9th IFIP 2.6 Working Conference on Database Semantics (D59) by World Scientific Publisher in 2001, pp. 113-129, describe a comparison of the performance analysis between an XML-Enabled Database and a Native XML database and Native XML databases are recommended therein for very complex structure systems. In a document by Joseph Fong, H K Wong, and Anthony Fong entitled ‘Performance Analysis between XML-Enabled Database and Native XML Database’, a book chapter of XML Data Management, edited by Akmal Chaudhri, Addison-Wesley, USA, March, 2003, steps are described for converting a relational database into an XML document. The described steps show how to translate relational schema into XML schema, followed by manually mapping data to an XML document.
Multi-database systems are systems that provide interoperation and a varying degree of integration among multiple databases. There are different approaches to multidatabase interoperability. Global schema integration is an approach that is based on complete integration of multiple databases in order to provide a global schema. However, there are several disadvantages of this approach, one of them is that it is difficult to identify relationships among attributes of two schemas and to identity relationships among entity types and relationship types. However, there is another approach, known as the Multidatabase Language Approach. The aim of this approach is to perform queries involving several databases at the same time. However, this approach requires users to learn another language and users may find it difficult to understand each individual database schema.
Some database management systems (e.g. Oracle, DB2) allow input of XQL queries to allow users to retrieve XML documents. However, the data retrieved are actually stored in tables in the relational database and are not stored in an XML database.
Conventional methods for storing XML documents in relational databases can roughly be classified into three categories: structure-mapping, model-mapping and semantic-preserving approaches.
The Model-Mapping Approach:
There have been several studies that use fixed relational schemas to store XML documents. Such approaches are known as model-mapping approaches. Each such approach has different mapping rules and database schema.
The “Edge” approach is described in Kanne, C., and Moerkotte, G., Efficient Storage of XML Data, Proceedings of the 16 International Conference on Data Engineering, 2000, Page(s): 198-198 and stores the XML data as a direct graph/tree in a single relational table. This approach maintains edges individually. Therefore it needs to concatenate the edges to form a path for processing user queries. As a sample table, it only keeps edge-labels, rather than the labeled paths. Therefore a large number of joins is needed to check edge connections.
Similar to the “Edge” approach, Thomas Kudrass, in a document entitled ‘Management of XML documents without schema in relational database systems’, published as Information and Software Technology, Volume 44, Issue 4, March 2002, Page(s): 269-275 describes an edge table enriched by an amount of information in order to distinguish between different target nodes. In this approach, the content of a document is stored in a leaf value (Leaf table) or in an attribute value (Attr table). Both are referenced from the Edge table via a foreign key. The edges of the document tree are identified by a source node and a target node. Each document has a unique ID so that an edge can be assigned to one document. A drawback of this approach is that the decomposition of a document produces a lot of tuples to be inserted into the database. Therefore, the load time may increase for a large document. Masatoshi Yoshikawa, and Toshiyuki Amagasa, in a document entitled ‘Xrel: A Path-Based Approach to Storage and Retrieval of XML Documents Using Relational Databases’, published as ACM Transactions on Internet Technology, Vol. 1, No. 1, August 2001, Page(s): 110-141 describe a system (XreI) in which an XML document is decomposed into nodes on the basis of its tree structure and stored in relational tables according to the node type, with path information from the root to each node. The Xrel system stores the directed graph of an XML document in four tables. The advantage of the XRel system is that it does not require recursive queries, and can perform the same function within the SQL-92 standard. Haifeng Jiang, Hongjun Lu, Wei Wang, and Jeffrey Xu Yu, in a document entitled ‘Xparent: an efficient RDBMS-Based XML database system’, published as Proceedings of the 18′ International Conference on Data Engineering. 2002, Page(s): 335-336 describe a system (Xparent) in which the data model of an Xpath system is adopted to represent XML documents. The Xparent system models a document as an ordered tree. It uses similar schema to those used in the XreI system. In this system the data-path id replaces the start and end pairs used in the XreI system. The advantage of the Xparent system is that it can be efficiently supported using conventional index mechanisms such as B-tree. One drawback of the Xparent system is that it requires a large number of joins to check edge-connections for processing complex queries.
In XML-Relational conversion which is described in a document by Latifur Khan, Yan Rao entitled ‘A performance evaluation of storing XML data in relational database management systems’, published as Proceeding of the 3rd international workshop on Web information and data management, November 2001 Page(s): 31-38, each document is stored in two relational tables. This approach preserves the nested structure of an XML document. A shortcoming of this approach is that PathId depends on an element's tag, and it might be the case that some elements occur multiple times which violates the definition of primary key (PathId). Extra work is required to solve such conflicts.
The Structure-Mapping Approach:
In structure-mapping, schemas are extracted from XML documents and a database schema is defined for each XML document.
Mary Fernández, Wang-Chiew Tan and Dan Suciu, in a document entitled ‘SilkRoute: trading between relations and XML’, published as Computer Networks, Volume 33, Issues 1-6, June 2000, Page(s): 723-745 describes a general framework for mapping relational databases to XML virtual views using a declarative query language, RXL (Relational to XML Transformation Language). The operation starts by writing an RXL query that defines the XML virtual view of the database. The main shortcoming with this approach is that queries over the views often produce composed queries with many unions. Iraklis Varlamis and Michalis Vazirgiannis, in a document entitled ‘Bridging XML-schema and relational databases, a system for generating and manipulating relational databases using valid XML documents’, published as Proceeding of the ACM Symposium on Document Engineering, November 2001, Page(s): 105-114 describe an X-Database system that acts as an interface between the application and database. The basis of the system is an XML-Schema that describes the logical model of interchanged information. A drawback of the X-Database system is that in this system the XML-Schema may be defined once in the beginning of the process and cannot be changed, but, in reality, the schema is changed over time in the majority of applications.
The XPERANTO system described by Michael Carey, Jerry Kiernan, Jayavel Shanmugasumidaram, Eugene Shekita, and Subbu Subramanianm, in a document entitled ‘XPERANTO: Middleware for Publishing Object-Relational Data as XML Documents’, published as Proceedings of the 26th VLDB Conference, 2000, Page(s): 646-648 operates as a middleware on top of a (an object) relational database system. This system starts by providing a default virtual view of a given (object) relational database. The user may then create more complex or specialised views based on the default view by using an XML query language. One attractive aspect of the XPERANTO approach is that it works in any existing relational database system because the XPERANTO system generates a regular SQL and tags the results outside the database engine.
Aoying Zhou, Hongjun Lu, Shihui Zheng, Yudi Liang, Long Zhang, Wenyun Ji, and Zengping Tian in a paper entitled “VXMLR: A Visual XML-Relational Database System” published as Proceedings of the 27th VLDB Conference, 2001, pages 646-648 present a visual based XML document management system, VXMLR. In this system, the XML document is parsed into a Document Object Mode tree and the DTD of the document is extracted. The document tree is then mapped and stored into a relational table. VXMLR maintains some statistic of data and a path directory, which are used in the query rewriting process to reduce the number of SQL statements and simplify join conditions.
The Semantic-Preserving Approach:
The semantic-preserve approach generates an XML structure that is able to describe the semantics and structure in the underlying relational database.
Wenyue Du, Mong Li Lee and Tok Wang Ling, in a document entitled ‘XML structures for relational data’, published as Proceedings of the Second International Conference on Web Information Systems Engineering, Volume 1, December 2001, Page(s): 151-160 describe a methodology which employs a semantically rich Object-Relational-Attribute model for semi-structured data (ORA-SS) in the translation process. ORA-SS models a rich variety of semantic constraints (strong/weak entities, binary/n-ary/recursive/ISA relationship type, single-valued/multi-valued attributes of entity types or relationship types and cardinality constraints) in the underlying relational database, and represents the implicit structures of relational data using hierarchy and referencing. ORA-SS preserves the inherent semantics and implicit structure in relational schema.
J. Fong, H. K. Wong and Z. Cheng, in a document entitled ‘Converting relational database into XML documents with DOM’, published as Information and Software Technology, Volume 45, Issue 6, April 2003, Pages 335-355 describe a system in which the relational schema are denormalized into joined tables which are transformed into a Document Object Model (DOM) according to their data dependency constraints. These DOMs are integrated into a DOM which is translated into an XML document. The data dependencies constraints in the de-normalized relational schema are mapped into XML document trees in elements and sub-elements. In the process, the partial functional dependencies are mapped into elements and attributes. The transitive data dependencies are mapped into element, sub-element, and sub-sub-elements in the XML documents. The multi-valued dependencies are mapped into multiple sub-elements under one element. The join dependencies are mapped into a group element. As a result, the data semantics in the relational schema are translated and preserved in the XML document.
Angela Cristina Duta, Ken Barker, Reda Alhajj, in a document entitled ‘ConvRel: relationship conversion to XML nested structures’, published as Proceedings of the 2004 ACM symposium on applied computing, March 2004, Page(s): 698-702 describe a system in which relational schemas are transformed into nested-based XML schema for each relational data source.
In summary, there is a need for a system having a relational database for traditional data processing and also its equivalent XML database for various applications (such as Bank-to-Bank (B2B) applications) with improved performance in the online conversion from relational data to an XML document. Furthermore, as users may prefer to keep two production database systems for computing, there is a need for a system in which a relational database may be used for internal data processing and its counterpart XML database may be used for external Internet data transmission. There is also a need for a method for converting between a relational database and an XML database which improves database performance, enables automatic XML database recovery in the case of system failures, and is easy to use enabling users to use their own familiar query language.