The approaches described in this section could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in this application and are not admitted to be prior art by inclusion in this section.
Database systems often store within their databases XML-formatted data. This data may come from a variety of sources, though the source is often an XML document or a database object.
In XML, data items, known as elements, are delimited by an opening tag and a closing tag. An element may also comprise attributes, which are specified in the opening tag of the element. Text between the tags of an element may represent any sort of data value, such as a string, date, or integer.
Text within an element may alternatively represent one or more elements. Elements represented within the text of another element are known as subelements or child elements. Elements that store subelements are known as parent elements. Since subelements are themselves elements, subelements may, in turn, be parent elements of their own subelements. The resulting hierarchical structure of XML-formatted data is often discussed in terms akin to those used to discuss a family tree. For example, a subelement is said to descend from its parent element or any element from which its parent descended. A parent element is said to be an ancestor element of any subelement of itself or of one of its descendant element. Collectively, an element along with its attributes and descendants, are often referred to as a tree or a subtree.
Applications or application components that utilize XML data often feature processes that generate XML events. Some processes that generate XML events include XML parsing and validation, as discussed in “Validation Of XML Content In A Streaming Fashion,” incorporated above. As another example, an application that searches XML data might implement a process for evaluating a certain XPath expression by streaming XML events from documents within a search corpus to a state machine representation of the expression. Such a process is discussed in, for example, “Technique To Estimate The Cost Of Streaming Evaluation Of XPaths,” incorporated above.
These event-generating processes commonly entail parsing through XML-formatted data linearly and generating XML events upon recognizing certain tokens. For example, an event-generating process may generate events upon recognizing either a beginning tag for an element or an attribute of an element.
In order to properly parse through XML data, an event-generating process may require a mechanism whereby an implementing component may determine information about the current state of the process (i.e., what events it has already generated, what tokens it has already seen, what characters it has encountered since it last generated an XML event, and so on). To “remember” this state information, an event-generating process will typically entail creating a number of memory buffers. Memory buffers may also be created during an event-generating process for reasons other than remembering state information.
Memory requirements for XML data vary throughout an event-generating process according to factors such as the structure of the XML data, the location of the currently processed element within that structure, and the data itself. These factors are typically not known up front, meaning that any given XML data source could require any number of memory buffers of any size. Because the number of memory buffers that will be required for an XML data source is unknown, an event-generating process requires creating those buffers only as needed during the event-generating process.
Many database systems binary-encode XML data sources. Many event-generating processes must therefore entail decoding binary-encoded XML before generating an XML event. Thus, some components that implement event-generating processes are described as XML decoders. Decoding requires additional memory resources. For example, to decode a binary-encoded XML data source, an event-generating process may require simultaneously traversing an XML schema upon which the binary-encoding was based. A large number of memory buffers may be required to assist traversal of the schema. Further discussion of handling binary-encoded XML may be found in “TECHNIQUES FOR EFFICIENT LOADING OF BINARY XML DATA,” as incorporated above.
Creating a new memory buffer in an event-generating process requires requesting that the system memory manager allocate a space (or “chunk”) in system memory for that buffer. This may be an extension of a chunk allocated for another memory buffer, or an entirely new chunk. Also, event-generating processes typically release buffers when they are no longer needed (e.g. when a process has completed parsing a subtree), thus resulting in an equal number of requests for deallocation of memory chunks.
Because of the complexity of system memory management, allocating and deallocating memory from the system is expensive in terms of CPU utilization. This expense adds up quickly for large and/or complex XML documents, which may require hundreds of thousands of allocations and deallocations as they result in the parsing of a large number of elements.
It is therefore desirable to provide techniques and apparatuses that more efficiently generate XML events from XML data.