International Publication No. WO 98/34179 (PCT/AU98/00050) in the name of Time Base Pty Ltd and published on 6 Aug. 1998 and counterpart U.S. Pat. No. 6,233,592 issued on 15 May 2001 to Schnelle et al. are incorporated herein by cross reference. In these documents, an electronic publishing system is disclosed that provides a sparse multidimensional matrix of data using a set of flat file records. In particular, the computer-implemented system publishes an electronic publication using text-based data. Predefined portions of the text-based data are stored and used for the publication. At least one of the predefined portions is modified, and the modified version is stored as well. The predefined portion is typically a block of text, greater in size than a single word, but less than an entire document. Thus, for example, in the case of legislation, the predefined portion may be a section of the Act. Each predefined portion and the modified portion(s) are marked up with one or more links using a markup language, preferably SGML or XML, The system also has attributes, each being a point on an axis of a multidimensional space for organising the predefined portions and the modified portion(s) of the text-based data. This system is simply referred to as the Multi Access Layer Technology or “MALT” system hereinafter.
Australian Patent Application No. 65470/00 filed on 12 Oct. 2000 in the name of TimeBase Pty Ltd, Canadian Patent Application No. 2323245 filed on 12 Oct. 2000 in the name of TimeBase Pty Ltd, New Zealand Patent Application No. 507510 filed on 12 Oct. 2000 in the name of TimeBase Pty Ltd and U.S. patent application Ser. No. 09/689,927 filed on Oct. 12, 2000 in the names of Lessing et al. are incorporated herein by cross reference.
U.S. patent application entitled “Resilient Data Links” filed on 18 Jul. 2001 in the names of Schnelle and Nolan is also incorporated herein by cross reference. In this document, a method, an apparatus and a computer program product for providing one or more resilient links in an electronic document are described. The methodology disclosed is referred to as “MALTlink” hereinafter.
Large or complex text-based datasets are typically hierarchical in nature. In the storage, maintenance and publication of such data, it is common to use a markup language capable of describing such hierarchies. XML is one such markup language that is more commonly used, particularly in the print, electronic or online publishing industries, and for government or public records or technical documentation. XML data is stored typically either in “flat” text files encoded in ASCII, Unicode, or other standard text encoding, or in a “native” XML database.
The flat text files may be part of a document management system. Such a document management system may be based on a relational database. Document management systems deal with a document as a whole and are able to store relevant data about each document. However, document management systems are typically not designed to operate on data (XML elements) within such documents. Consequently, a document management system does not typically operate on all (or even a substantial number of the) XML elements contained in flat text files on which the document managing system is operating. An XML database, in contrast, operates on all XML elements of the XML data that the XML database is storing and, consequently, XML databases must manage large amounts of data and detail. As a result, document management systems have a limited usefulness resulting from a lack of precision and XML databases are overwhelmed by the multiplicity of XML elements that are to be managed.
Attempts have been made to transform XML data into a set of SQL relational database tables. SQL is a database technology that provides a user with powerful query functionality and powerful data management tools. SQL possesses the stability of a mature technology, whereas XML databases are still a relatively immature technology, and thus possess a degree of instability. SQL is a fast and efficient technology, and a wide choice of software and hardware manufacturers offer or support SQL databases.
Tree mapping techniques are typically used to convert XML data into relational databases. Conventional tree mapping techniques, however, often attempt to capture all of the document hierarchy. This is almost never necessary and can result in substantial size and performance penalties in the resulting SQL tables. Such tree mapping techniques typically result in a far larger number of SQL tables than is necessary.
As an example, consider the XML fragment shown in FIG. 1. A classical approach to conversion is to represent the element tree with one table per element type, possibly with an added table to store the tree structure. A correct, and possibly even reversible, outcome results. However, the performance and management advantages (which prompted the conversion in the first place) can be diminished or even lost entirely, because of the size and complexity of the resulting tables.
Thus, a need exists for providing an efficient method for converting an XML document to a set of SQL tables.