Structured documents have nested structures, i.e., structures that define hierarchical relationships between elements of a document. Documents written in Extensible Markup Language (XML) are structured documents. Typically, a structured document can be represented by a data model comprising a plurality of hierarchical nodes that form a “node tree” comprising a root node, branch nodes and leaf nodes. The term “node” is used in the Direct Object Model (DOM)-sense, which is a standard XML construct well known to those skilled in the art. In the DOM construct certain hierarchical rules apply. For example, every node, aside from a root node, has one parent node. In addition, a node can have zero or at least one child node. Accordingly, a child node can have zero or multiple siblings, but only one next sibling if it has siblings at all. Typically, a node with content in its first child node is referred to as a “leaf” node.
As applied, each node in the DOM construct corresponds to an object of the XML document. Each node can be described by a path that defines the hierarchical relationship between the node and its parent node. Every path begins at a root node corresponding to a root object and follows the hierarchical structure defined by the XML document. Throughout this description, the term “node” is used interchangeably with the term “object.”
As more and more business applications create and use structured documents, the challenge is to store, search, and retrieve these documents. Database management systems (DBMS) are available that are configured to receive and store structured documents in their native format. For example, EMC Documentum xDB, developed by EMC Corporation of Hopkinton, Mass., is a high-performance and scalable native XML DBMS that can store and manage structured documents in their native format, e.g., as a nested data model according to the DOM construct. Typically, the XML DBMS can parse a structured document into its objects and can generate nodes representing the objects of the document so that the nodes can be stored in the database. By doing so, the XML DBMS allows database structures to be easily modified to adapt to changing information requirements.
As discussed above, the DOM construct provides a useful and efficient data model for representing a structured document and is essential for implementing the XML DBMS. Nevertheless, disadvantages are inherent. For example, because every document is represented by a corresponding DOM, a change to an existing document requires a new DOM to be generated and stored for the modified, but new, document. When changes to a document are minor, e.g., correcting a spelling error or adding a citation, storing multiple DOMs for documents that are essentially identical leads to redundancy and waste. In an attempt to minimize this redundancy, an older version of a document can be stored as a delta of a newer version of the document. Nonetheless, with this approach, the context of the older version of the document is lost and therefore traversing and/or querying the older version is very difficult, if not possible.