Structured data conforms to a type definition. For example, a type definition for a “person” type may define distinct attributes such as “name,” “birthdate,” “height,” “weight,” and “gender.” Each “instance” of a particular type comprises a separate value for each of the attributes defined by the particular type. For example, an instance of the “person” type might comprise values such as “Fred Brown,” “Jan. 1, 1980,” “72 inches,” “240 pounds,” and “male.” Each attribute is also of a type. For example, the “name” attribute might be of a “string” type, the “birthdate” attribute might be of “date” type, and the “gender” attribute might be of an “enumerated” type. Structured data might comprise multiple different instances of the same type.
Different approaches may be used to store structured data into a database. One such approach is called “conventional path loading.” According to conventional path loading, a client application parses structured data that comprises one or more instances of a type. Values within the structured data correspond to attributes of the type. The client application generates Structured Query Language (SQL) commands, such as INSERT commands, that, when executed by a database server, cause the database server to insert the values into corresponding columns of a database table. Unfortunately, due to its heavy use of the SQL engine, conventional path loading often suffers in terms of performance and memory consumption.
Another approach for storing structured data into a database is called “direct path loading.” Through direct path loading, values within structured data are stored directly into a database without causing the SQL engine to load each row of data. By consulting a control file that is associated with the structured data, a client application can determine the types to which instances within the structured data conform. If the structures of the types are defined to the client application, then, based on those structures, the client application can create an array that comprises columns that correspond to the types' attributes. The client application can populate each attribute's corresponding column with values that correspond to that attribute. Once the array is populated, the client application can convert the array into a stream of data that conforms to the format of a database's data blocks. The client application then can stream the data to a database server, which can write the data directly into one or more data blocks in the database. Direct path loading exhibits performance superior to that of conventional path loading.
Some types indicated by a control file may be standard types that are defined to a client application. A scalar type is an example of such a standard type. The client application has information about the characteristics of a scalar type, such as the maximum storage size of a scalar type. With this information, the client can generate the data stream as described above.
However, some types indicated by a control file might not be among the types that are defined to the client application. A type indicated by a control file might have a structure that is defined only to a program that implements that type. Although the type might comprise attributes that are of standard types, the control file and the client application might lack any information about the number or types of such attributes.
Without such information, the client application cannot generate or populate an array that comprises a separate column for each such attribute. The client application does not possess sufficient information to map values that correspond to such attributes to corresponding columns of a table in a relational database. Consequently, there is no effective way for the client application to store instances of such a type in a database using the direct path loading approach.
Types that are not defined to a client application are called “opaque types” relative to the client application, because the internal structure of such types is obscured from the client application. The internal structure of an opaque type, including the number and types of attributes of the opaque type, often are defined only to a program that implements the opaque type. Such a program may be external to both the client application and the database server.
It may not be practical to modify a client application every time that a new type is introduced, so that the new type is defined to the client application. Additionally, the structures of some existing types may change as time passes. It may be impractical to modify a client application every time that the structure of an existing type changes.
One kind of opaque type is an XML type. An example of an XML type is provided in co-pending U.S. patent application Ser. No. 10/259,278. “XML” stands for “Extensible Markup Language.” An XML schema is metadata that describes a hierarchical structure. Instances of the XML schema comprise data that conforms to the structure described by the XML schema. Through XML elements expressed in the structure, an XML schema defines one or more types. XML elements in such a structure may be mapped to columns of database tables. Using the conventional path loading approach, values that correspond to the XML elements may be stored in the columns that are mapped to those XML elements.
An XML type is special because an XML type may define alternative structures to which instances of the XML type may conform. For example, an XML type definition might indicate that one or more attributes of the XML type are optional. Therefore, if attributes “A,” “B,” and “C” are optional, then one instance of the XML type might comprise a value for attribute “A,” but no values for attributes “B” or “C,” while another instance of the XML type might comprise a value for attribute “B,” but no values for attributes “A” or “C.” Because the instances may conform to alternative defined structures rather than a single defined structure, the instances may be said to comprise “semistructured” data rather than “structured” data.
At present, client applications are unable to use the direct path loading approach effectively to store semistructured data. Because the direct path loading approach exhibits performance superior to that of the conventional path loading approach, a technique that overcomes the limitations of prior approaches to is needed.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.