1. Field of the Invention
The present invention relates to a method, system, program, and data structures for managing structured documents in a database.
2. Description of the Related Art
Many documents and data objects are encoded in the Extensible Mark-up Language (XML) as structured documents. Just about any data type can be encoded within an XML structured document, such as vector graphics, e-commerce transactions, mathematical equations, object meta-data, server APIs, etc. XML documents include tags to mark a start and end of each of the logical parts (called elements) of the document. For instance, if the XML document defines a book, the elements would include the table of contents, chapters, appendices, etc. Each element may include one or more associated objects. The associated object may comprise an attribute value or content, such as the case with a text object element including text, a graphics object element including an image, etc. An XML document further includes a definition of each element in a formal model, known as a Document Type Definition (DTD). The DTD provides attributes for each element and indicates the relationship of the elements and possible attribute values for the elements. Elements may be arranged in a hierarchical relationship. In such case, the DTD would define the hierarchical relationship of the elements to one another. Further details of XML are described in the publication “Extensible Markup Language (XML) 1.0”, Second Edition (Copyright W3C, Oct. 6, 2000), which publication is incorporated herein by reference in its entirety.
Users can encode and view an XML document with the Document Object Model (DOM) application program interface (API). The DOM interface is described in the publication entitled “Document Object Model (DOM) Level 1 Specification, Version 1.0,” document no. REC-DOM-Level-1-19981001 (Copyright W3C 1998), which publication is incorporated herein by reference in its entirety. The DOM interface represents the document as a hierarchical arrangement of nodes. The DOM interface specifies various commands to access elements and attributes within the DOM hierarchy.
Because nowadays many documents, especially documents available over the Internet, are encoded as XML documents, there is a need in the art to manage and index the contents of XML documents. In the prior art, if a user wants to search the content of XML documents, one technique is to use the DOM interface commands or an XML parser to access the element(s) in each document subject and then determine whether the objects associated with an element, e.g., attribute values or content, match a search or query criteria. The term “object” discussed in association with elements as used herein refers to any data associated with an element, such as attribute values, content (e.g., text content, images, movies, audio, etc.) Such an approach requires traversing each XML document to locate the object, eg., attribute value or content, subject to the search, retrieve the object, and then compare the accessed object with the search criteria. Moreover, in complex documents including numerous elements at many different hierarchical levels, encoding the methods to traverse the DOM tree to a particular node may be substantially complex.
The increased interest in providing query and search facilities for XML documents has led to the creation of an XML Query Group at the World Wide Web Consortium (W3C). The goal of the XML Query Group is to provide flexible query facilities to extract data from real and virtual documents on the Web to allow interaction between the XML Web world and the database world with the goal of allowing collections of XML files to be accessed like databases.
For these reasons, there is a need in the art for methodologies that allow for database like management and searching of XML and other interchangeable structured document formats.