The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
Structured data conforms to a type definition. For example, a type definition for a “person” type may define distinct attributes such as “name,” “birthdate,” “height,” “weight,” and “gender.” Each “instance” of a particular type comprises a separate value for each of the attributes defined by the particular type. For example, an instance of the “person” type might comprise values such as “Fred Brown,” “Jan. 1, 1980 ,” “72 inches,” “240 pounds,” and “male.” Each attribute is also of a type. For example, the “name” attribute might be of a “string” type, the “birthdate” attribute might be of “date” type, and the “gender” attribute might be of an “enumerated” type. Structured data might comprise multiple different instances of the same type.
Different approaches may be used to store structured data into a database. One such approach is called “statement-based path loading.” According to the statement-based path loading approach, a client application parses structured data that comprises one or more instances of a type. Values within the structured data correspond to attributes of the type. The client application generates Structured Query Language (SQL) statements, such as INSERT commands, that, when executed by a database server, cause the database server to insert the values into corresponding columns of a database table. Unfortunately, due to its heavy use of the SQL engine, statement-based path loading often suffers in terms of performance and memory consumption.
Another approach for storing structured data into a database is called “direct path loading.” Through direct path loading, values within structured data are stored directly into a database without causing the SQL engine to load each row of data. By consulting a control file that is associated with the structured data, a client application can determine the data type of instances of structured data. If the structures of the types are defined to the client application, then, based on those structures, the client application can create an array that corresponds to the types' attributes. The client application can populate the array with values that correspond to that attribute. Once the array is populated, the client application can convert the array into a stream of data that conforms to the format of a database's data blocks. The client application then can stream the data to a database server, which can write the data directly into one or more data blocks in the database. Direct path loading exhibits performance superior to that of statement-based path loading.
Some types indicated by a control file may be standard types that are defined to a client application, e.g., a scalar type is an example of a standard type. However, some types indicated by a control file might not be among the types that are defined to the client application. Types that are not defined to a client application are called “opaque types” relative to the client application, because the internal structure of such types is obscured from, or unknown to, the client application. The internal structure of an opaque type, including the number and types of attributes of the opaque type, often are defined only to a program that implements the opaque type. An opaque type implementor may be external to both the client application and the database server.
An opaque type may be an XML type. An example of an XML type is provided in co-pending U.S. patent application Ser. No. 10/259,278. An XML schema is metadata that describes a hierarchical structure. Instances of the XML schema comprise data that conforms to the structure described by the XML schema. Through XML elements expressed in the structure, an XML schema defines one or more types.
An XML document is a document that contains one or more XML elements that conform to an XML schema. Unfortunately, the amount of memory required to maintain an array representing the XML elements of an XML document may be large. Further, maintaining the control file of an XML document in memory also requires a significant amount of memory, e.g., in some cases the amount of memory required to maintain a control file for an XML document may be ten times the amount of memory to maintain the corresponding XML document in memory. As a result, a large amount of memory is required by a client application to load an XML document into memory when transferring the XML document to a persistent storage. Moreover, transferring XML documents to persistent storage in this manner is very CPU intensive for the client application, which may result in performance degradation.
Consequently, an approach for loading XML documents into memory for use in transferring the XML documents to a persistent storage that avoids the aforementioned problems is advantageous.