1. Field of the Invention
The present invention relates to the field of markup processing and more particularly to processing large sized relationship-specifying markup language documents.
2. Description of the Related Art
The Extensible Markup Language (XML) is a markup language specification widely credited with improving the functionality of the World Wide Web by allowing the identification of information in a more accurate, flexible, and adaptable way. XML is referred to as “extensible” because XML is not a fixed format like the hypertext markup language (HTML) which is a single, predefined markup language. Rather, XML is a meta-language that describes other languages. As such, XML allows for the design of other markup languages for limitless different types of documents. XML can act as a meta-language because XML is written according to the standardized general markup language (SGML)—the international standard meta-language for text document markup.
There are several methods for processing an XML document. In one method, every clause in the XML document is accounted for and a hierarchical model can be constructed reflecting the interrelationships between the clauses of the XML document. Referred to as a document object model (DOM), the DOM tree once in memory can be traversed at will in order to manipulate the XML document. Another method provides for the event-driven serial parsing of clauses in an XML document. Referred to as “SAX” parsing—an acronym for simple application programming interface (API) for XML—consumes a significantly smaller memory footprint than DOM processing as an entire hierarchical model in the form of a DOM tree need not be constructed prior to processing the XML document.
XML documents have proven particularly effective in representing the state and content of an enterprise computing architecture. Specifically, XML documents have been utilized to import content into and export content from a configuration management database (CMDB). A CMDB is a unified or federated repository of information related to all the components of an information system. A CMDB provides a view to the information technology manager of an organization in order to understand the relationships between the components of the information system. The CMDB further facilitates the monitoring and management of the configuration of the components of the information system.
Component and relationship information imported from information technology (IT) management systems into a CMDB can be provided in XML documents. Examining the content of an XML document used to import information from an IT management system into a CMDB will reveal that subsets of components are linked together by relationships. Furthermore, in examining all of the components linked by a set of relationships, a set of data graphs can be constructed where the nodes of the graphs represent the components and the connectors represent the relationships.
One requirement of a conventional CMDB is that data graphs are to be written to the database. Data graphs generally include a single object, or multiple objects. Data graphs to be loaded into the CMDB generally are loaded by an XML parsing program that processes an XML document. The XML document generally contains descriptions of the data objects to be loaded into the CMDB as well as the relationships between the data objects. In order to load the data from the XML document into the CMDB database, the data graph represented in the XML document must be reconstructed in memory as objects and the relationships between the objects must be understood.
Loading the data graph reflected in the XML document has proven inefficient—particularly where the XML document is very large in size as can be the case with configuration management data. As it will be understood, using a DOM tree to process a very large XML document can consume an inordinate amount of memory. Likewise, SAX parsing a very large XML document still requires the creation in memory of the objects reflected in the XML document and the stitching together of the relationships between the objects. Also, in SAX parsing, multiple passes through the large XML document can be required as there is no guaranteed order in which components and relationships will appear that are needed to construct the data graphs. Accordingly, conventional techniques for bulk loading a large relationship-specifying markup language document into a database have proven inadequate.