Use of the Extensible Markup Language (XML) has become a popular and useful technique for representing and exchanging information of any kind, such as exchanging information among computer program applications and services. Consequently, effective and efficient storage and manipulation of XML data has likewise become useful and necessary. Thus, some databases have been augmented to support the storage and manipulation of and access to XML data. One of the primary requirements of applications using XML as their data model is schema flexibility. However, databases for storing XML data are traditionally not optimized for schema flexibility. Although such databases may operate efficiently in a scenario in which a schema is provided and the schema is not prone to change, these databases lack adequate support for XML when the schema is prone to changes or when the schema is loosely structured.
XML data is self-descriptive (i.e., it contains tags along with data), but the standard XML serialization format is text-based, including the numbers and dates. This results in a significant increase in the size of XML documents compared to other proprietary formats for capturing the same data. The increased size of XML documents causes overhead costs during transmission, due to limited network bandwidths, as well as slower performance of storage and retrieval operations, due to limited disk I/O bandwidth. Hence, a binary encoding form for XML data was introduced which attempts to maximize schema flexibility while still providing storage and querying benefits. This binary encoding form is described in U.S. patent application Ser. No. 11/182,997 filed by Ravi Murthy et al., entitled “Encoding of Hierarchically Organized Data for Efficient Storage and Processing” (“the Murthy application”), the entire content of which is incorporated by reference in its entirety for all purposes as if fully disclosed herein.
With the encoding format described in the Murthy application, XML data is stored in a compact binary form that maintains all of the features of XML data in a useable form, such as the hierarchical structure underlying the data (e.g., the data model or infoset), the notion of elements and attributes, etc. This compact binary format significantly minimizes the overhead due to XML tags. Hence, the encoded XML is more compact than a binary representation of the corresponding textual character representation. This binary format can be processed more efficiently than parsing because the data is effectively pre-parsed.
XML documents in a database can be modified by inserting new nodes and by changing or deleting existing nodes, all of which are referred to herein collectively as an update operation. Depending on the nature of the application manipulating XML data, these types of operations can be relatively common. However, existing XML data storage systems do not provide efficient means of updating binary encoded XML documents. Such systems typically load an entire XML document into local memory (e.g., RAM) in the form of an object tree (e.g., a DOM), change the data in memory, and convert the updated DOM tree back into the binary form for storage. This process is generally inefficient and leads to scalability and performance problems because, for one reason, the entire XML document needs to be materialized in local memory. Additionally, some existing XML data storage systems might provide optimized techniques for updating XML data when there is a very specific XML schema available. However, these systems do not adequately address scenarios in which the XML schema is very unconstrained or scenarios in which there is no XML schema available.
Hence, based on the foregoing, there is a need for techniques for efficiently updating XML data stored persistently in a database.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.