The number of businesses exchanging information electronically is proliferating. Businesses that exchange information have recognized the need for a common standard for representing data. Extensible Markup Language (“XML”) is rapidly becoming the common standard for representing data.
XML describes and provides structure to a body of data, such as a file or data packet, referred to herein as an XML document. XML standards provide for tags that delimit sections of an XML document referred to as XML elements, or simply “elements”.
An element may contain various types of data, including element attributes and other elements. An element that is contained by another element is referred to as a descendant of that element. By defining an element that contains attributes and descendant elements, the XML document defines parent-child hierarchical relationships between the element, its descendant elements, and its attributes.
The term node is used to refer to individual elements and element attributes in an XML document. Thus, an XML document defines a hierarchy of nodes having parent-child relationships. Such a hierarchy is referred to herein as a node tree or a node hierarchy.
The term attribute is used herein to refer to a discrete part or element of a structure, such as a data structure or an object that belongs to an object type according to the object-oriented methodology. An attribute may be a complex construct containing one or more other attributes, referred to herein as a member of the attribute. XML standards provide for element attributes in the form of name-value pairs. While the meaning of the term attribute, as used herein, encompasses element attributes, the term is not so limited.
Industry standards define structures for representing XML documents. One such standard is the Document Object Model (DOM), promulgated by the World Wide Web Consortium (W3C).
In order for a computer to operate on an XML document, an in-memory representation of the XML document is generated. In general, an XML document is loaded from a storage device (e.g., a disk that stores files that contain XML entities) or from data received over a communications channel, to generate in-memory data structures used to represent an XML document. The in-memory data structures are manipulated by computer processes executing software and programs. The process of loading an XML document into memory and generating an in-memory representation of the XML document is referred to as manifestation or manifesting an XML document. Typically, applications access and manipulate the in-memory data structures, created by manifestation, through an API.
Under conventional approaches for manifestation, when an XML document is manifested, the entire XML document is manifested. XML documents can be very large, and thus require a significant amount of memory when manifesting them. Some XML documents are so large that memory needed to manifest them far surpasses the memory allocated to them and may also far surpass the capacity of many computers.
Based on the foregoing, it is desirable to provide a mechanism that reduces the amount of memory needed to manifest an XML document.
In one approach described in Pannala, cited above, an XML document is broken up into a plurality of loadable units that can be separtely stored in database objects of a databae system. Then, when a process attempts to manifest data from the XML document, only the loadable units that contain the data of interest are loaded into memory from the database. The entire XML document is not manifest. A loadable unit is a set of one or more nodes in an XML document. When one node in the loadable unit is manifest, all the nodes in the loadable unit are also manifest. Loadable units may, but not necessarily, correlate to content structures that store the nodes on persistent storage.
While the system of Pannala is useful for many purposes, certain operations fail to take advantage of the separately stored and loaded loadable units. Such operations continue to demand excessive amounts of memory; they do not scale to large XML documents. Such operations include operations that express an interest in many or all of the loadable units of a large XML document, and the operations that initially insert an entire large XML document onto persistent storage, such as into a database system.
Based on the foregoing, it is desirable to provide techniques for reducing the amount of memory needed by operations that involve enough loadable units of an XML document to exceed available memory.
In addition, the approach of Pannela assumes the contents of the loadable units loaded into memory are not changed there, so that a loadable unit can always be replaced by reloading that loadable unit from persistent storage. However, in many operations, one or more of the loadable units have different contents in memory than they have stored separately on persistent storage. For example, during the initial insert into a database system, none of the loadable units first loaded into memory reside as separately stored units in the database on persistent storage. Such loadable units are said to be “dirty.” The approach of Pannela is not suitable for dirty loadable units.
Based on the foregoing, it is further desirable to provide techniques for retaining information during operations that involve dirty loadable units of an XML document.
The past approaches described in this section could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, the approaches described in this section are not to be considered prior art to the claims in this application merely due to the presence of these approaches in this background section.