1. Field of the Invention
This invention relates in general to database management systems performed by computers, and in particular to an optimized method and system for decomposing and storing of markup based documents, such as XML documents, in a relational database. This allows easy and fast subsequent retrieval of data from the database in the XML format and reconstruction of the XML document.
2. Description of Related Art
Databases are computerized information storage and retrieval systems. A Relational Database Management System (RDBMS) is a database management system (DBMS) which uses relational techniques for storing and retrieving data. RDBMS software using a Structured Query Language (SQL) interface is well known in the art. The SQL interface has evolved into a standard language for RDBMS software and has been adopted as such by both the American National Standards Organization (ANSI) and the International Standards Organization (ISO).
Extensible Markup language (XML) is a standard used for representing data on the Internet in a hierarchical data format and for information exchange on the World Wide Web. Presently, there is a need for a an optimized method and system for serializing and storing of XML documents in a relational database, which preserves the structure of the XML document for subsequent querying of XML document components.
An XML document consists of nested element structures, starting with a root element. An XML document is tree-structured, with each node in the tree representing an element being described by a name. Data describing an element can be in the form of attributes or sub-elements. An xe2x80x9cidxe2x80x9d attribute uniquely identifies an element within an XML document and can be used to reference the element from another element. An XML document uses tags to describe the type of data following the tag. Thus, an XML document is self-describing because the data can be interpreted by a remote user without an input from the creator of the document.
There are numerous conventional software products used for transferring data contained in an XML document into a database. They consist of modules capable of decomposing the XML document and storing it in a database. For querying the database to obtain the XML data, conventional products usually use an XML query to search the contents of the XML document for elements or attributes. Moreover, most such products store data in an object-relational database. Some conventional products use structural serialization which is performed on the basis of the hierarchical structure of XML documents. In the structural serialization of XML documents the tree structure of an XML document is mapped to a set of relational tables. However, this often leads to a large number of tables for even a simple XML document. Moreover, when the serialization is carried out using the structural serialization method, the queries usually need to have some kind of mapping information in order to obtain valid results, since there is a need to perform a mapping between an XML schema and a relational database schema.
While there have been various techniques developed for decomposing and storing of markup based documents, such as XML documents, in a database, there is a need for a simple, optimized and generic method for serializing and storing of markup based documents in a relational database, which allows easy and fast subsequent retrieval of data from the database in the original format and reconstruction of the document. Moreover, such method has to be independent of the hierarchical structure and structural complexity of the stored document.
The foregoing and other objects, features, and advantages of the present invention will be apparent from the following detailed description of the preferred embodiments, which makes reference to several drawing figures.
One preferred embodiment of the present invention is a computer-based markup serialization method for transferring data contained in a markup based document, such as an Extensible Markup Language (XML) document, into a relational database stored in an electronic storage device having a database management system. It is used for easy subsequent retrieval of data from the database in the XML format and reconstruction of the XML document. The method decomposes the document according to basic markup types of the documents"" data components and stores the decomposed document in a set of markup tables created in the database, one markup table for each basic markup type, thereby preserving the hierarchical tree structure, parent-child order, and components of the document. The basic markup types of an XML document are ATTRIBUTE, CDATA_SECTION, COMMENT, DOCUMENT_FRAGMENT, DOCUMENT, DOCUMENT_TYPE, ELEMENT, ENTITY, ENTITY_REFERENCE, NOTATION, PROCESSING_INSTRUCTION and TEXT. For querying the database markup tables the method uses SQL queries to retrieve the XML document components in the XML format.
Another preferred embodiment of the present invention is a system implementing the above-mentioned method embodiment of the present invention.
Yet another preferred embodiment of the present invention includes a computer usable medium tangibly embodying a program of instructions executable by the computer to perform method steps of the above-mentioned method embodiment of the present invention.