1. Field of Invention
The present invention relates generally to the field of XML data management. More specifically, the present invention is related to updating and versioning of XML documents or sub-documents.
2. Discussion of Prior Art
XML has become the de facto standard for information exchange and publishing format on the web. Over the past several years, there has been a tremendous surge of interest in XML data management for document management and for managing more flexibly structured hierarchical data using XML. Clearly, in either case, there is a need to allow users to update XML documents or sub-documents and develop techniques to process them efficiently in database management systems (DBMS). Update capabilities not only include document level updates in which XML documents are simply replaced as a whole when being modified, but also support sub-document updating where changes or deltas are incrementally incorporated into XML documents.
When a small change is made to an XML document, it is more efficient for subscribers to download changes (or delta) rather than the whole document. For example, one primitive way for incorporating a delta into XML documents is to treat XML documents as text files and use a “diff” or similar program to generate delta files. However, to get the new version, a complete document will have to be generated by merging the previous version with the delta before it can be used. Another improved way may use XML elements as units instead of text lines. Up until now delta definition and incorporation has been a largely unanswered problem, addressed with narrow domain-specific approaches.
A typical approach, for example, DeltaXML provides change identification for the XML documents in legal publication industry and is able to store just the delta and the original source file. Multiple revisions of the same document do not require as much storage space as before. This also allows roll-back and roll-forward of any number of changes more efficiently. Such an approach of storing an XML document and its deltas does not address the main challenges of XML updating in DBMS, where XML data may be stored in a data model using records.
FIG. 1 shows two different approaches to accept XML delta into an XML document. Previous approaches receive deltas and update an XML document to get a new document as shown in FIG. 1a. Clearly, this incurs a large number of disk I/O operations and thus it is not efficient. A better approach is to store an XML document into multiple records and enable updates at sub-document level as shown in FIG. 1b. Thus, the question of “how to manage these XML records with versions?” needs to be answered.
When XML updating is allowed, concurrency control is necessary in order to ensure data consistency. There are many known solutions for concurrency control in the context of database systems such as data item locking, timestamp ordering, or multi-versioning combined with locking or timestamp ordering. These techniques are not directly applicable or too inefficient for XML updating due to hierarchical relationship among data items or enormous number of fine-granular nodes. An application filed by IBM (Ser. No. 10/709,416) uses sub-document locking using pre-fix encoded node IDs for concurrency control without versioning.
Current database systems are designed to support XML update. In stead of storing an XML document as a whole piece, a new technique is used to store XML documents in which an XML document is divided into many records, where each record stores a group of XML nodes. Record-based storage enables updating XML at sub-document level efficiently with low cost. Concurrently filed IBM application titled, “Packing nodes into records to store XML XQuery data model and other hierarchically structured data”, discusses such record based storage.
In order to fully explore the new record-based storage, a new efficient technology to support XML sub-document updating with versioning is needed. In a database system, an XML document to be updated may be read by many other XML readers at the same time. In order to ensure that the XML readers read consistent XML data, the concurrency control problem needs to be solved.
The following patents, patent application publications and references provide for methods of retrieving and updating of documents.
Japanese patent assigned to Fujitsu Ltd., (8-190543), discloses a document processor which links document update by a document file and another file at the time of coupling another file to the document file.
Japanese patent assigned to Ricoh Co. Ltd., (2002-269139), discloses a document retrieving method that involves searching a document based on divided character sequence index and word index designating the document.
U.S. patent assigned to Inventec Corp., (U.S. Pat. No. 6,610,104), relates to a method of updating a document by means of appending for enabling a user to easily carry out a query to documents with different versions and switching between them.
U.S. patent application to Wilce et al., (2003/0023528 A1), discloses a document level check-in and check-out concurrency management process. A document lock is maintained on the entire document until the user checks-in the document and the lock prevents another user from making changes to the document.
Article entitled “Generalized Process Structure Grammars (GPSG) for Flexible Representations of work”, by Glance et al., discusses the representation of work in workflow systems, and proposes context-free grammar type syntax to represent flexible work process, which can be activity-centric or document-centric. For document-centric cooperative work, documents can be decomposable or non-decomposable. Decomposable documents can be divided into sub-documents. In a multi-authoring environment, documents may be worked on concurrently with multiple versions. The GPSG can be used to describe constraints and relationships between activities. It provides for a check-in/check-out mechanism.
While updating at document level with multiple versions is relatively simple, there is no prior art wherein multiple versions of sub-documents are updated efficiently such that only the changed portion of the XML document is updated using a new version while keeping the rest unaffected in disk or other storage device.
Whatever the precise merits, features, and advantages of the above cited references, none of them achieves or fulfills the purposes of the present invention.