International Publication No. WO 98/34179 (PCT/AU98/00050) in the name of Time Base Pty Ltd and published on Aug. 6, 1998 and counterpart U.S. Pat. No. 6,233,592 issued on May 15, 2001 to Schnelle et al. are incorporated herein by reference. In these documents, an electronic publishing system is disclosed that provides a sparse multidimensional matrix of data using a set of flat file records. In particular, the computer-implemented system publishes an electronic publication using text-based data. Predefined portions of the text-based data are stored and used for the publication. At least one of the predefined portions is modified, and the modified version is stored as well. The predefined portion is typically a block of text, greater in size than a single word, but less than an entire document. Thus, for example, in the case of legislation, the predefined portion may be a section of the Act. Each predefined portion and the modified portion(s) are marked up with one or more links using a markup language, preferably Standard Generalized Markup Language (SGML) or eXtensible Markup Language (XML). The system also has attributes, each being a point on an axis of a multidimnensional space for organising the predefined portions and the modified portion(s) of the text-based data. This system is simply referred to as the Multi Access Layer Technology or “MALT” system hereinafter.
Australian Patent Application No. 65470/00 filed on 12 Oct., 2000 in the name of TimeBase Pty Ltd, Canadian Patent Application No. 2323245 filed on 12 Oct., 2000 in the name of TimeBase Pty Ltd, New Zealand Patent Application No. 507510 filed on 12 Oct., 2000 in the name of TimeBase Pty Ltd and U.S. patent application Ser. No. 09/689927 filed on Oct. 12, 2000 in the names of Lessing et al. are incorporated herein by reference.
Large or complex text-based datasets are typically hierarchical in nature. In the storage, maintenance and publication of such data, a markup language capable of describing such hierarchies is commonly used. XML is one such markup language that is more commonly used, particularly in the print, electronic or online publishing industries, and for government or public records or technical documentation. XML data is stored typically either in “flat” text files encoded in ASCII, Unicode, or other standard text encoding, or in a “native” XML database.
The flat text files may be part of a document management system. Such a document management system may be based on a relational database. Document management systems deal with a document as a whole and are able to store relevant data is about each document. However, document management systems are typically not designed to operate on data (XML elements) within such documents. Consequently, a document management system does not typically operate on all (or even a substantial number of the) XML elements contained in flat text files on which the document managing system is operating. An XML database, in contrast, operates on all XML elements of the XML data that the XML database is storing and, consequently, XML databases must manage large amounts of data and detail. As a result, document management systems have a limited usefulness resulting from a lack of precision, and XML databases are overwhelmed by the multiplicity of XML elements that are to be managed.
Attempts have been made to transform XML data into a set of Structured Query Language (SQL) relational database tables. SQL is a database technology that provides a user with powerful query functionality and powerful data management tools. SQL possesses the stability of a mature technology, whereas XML databases are still a relatively immature technology and thus possess a degree of instability. SQL is a fast and efficient technology, and a wide choice of software and hardware manufacturers offer or support SQL databases.
Object relational mapping techniques are typically used to convert XML data into relational databases. Conventional object relational mapping techniques, however, often attempt to capture all of the document hierarchy. This is almost never necessary and can result in substantial size and performance penalties in the resulting SQL tables. Such object relational mapping techniques typically result in a far larger number of SQL tables than is necessary.
Thus, a need exists for providing an efficient method for converting a markup language document to a set of database tables, such that the conversion is reversible. The set consists of a small, fixed number of tables and may consist of a single table. A further need exists for providing a method for converting a markup language document to a set of database tables, such that the converted markup language document can be maintained without requiring a conversion back to the original markup language format of the document.