The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
The Extensible Markup Language (XML) is the standard for data and documents that is finding wide acceptance in the computer industry. XML describes and provides structure to a body of data, such as a file or data packet. The XML standard provides for tags that delimit sections of XML documents referred to as XML elements.
Information about the structure of specific types of XML documents may be specified in documents referred to as “XML schemas”. For example, the XML schema for a particular type of XML document may specify the names for the elements contained in that type of XML document, the hierarchical relationship between the elements contained in that type of XML document, and the type of values contained in that particular type of XML document. Standards governing XML schemas include XML Schema, Part 0, Part 1, Part 2, W3C Recommendation, 2 May 2001, the contents of which are incorporated herein by reference, XML Schema Part 1: Structures, Second Edition, W3C Recommendation 28 Oct. 2004, the contents of which are incorporated herein by reference, and XML Schema Part 2: Datatypes Second Edition, W3C Recommendation 28 Oct. 2004, the contents of which incorporated herein by reference.
XML Storage Mechanisms
Various types of storage mechanisms are used to store a XML document. One type of storage mechanism stores a XML document as a text file in a file system.
Another type of storage mechanism uses object-relational database systems that are enhanced to store and process queries for collections of XML documents. Furthermore, these object-relational database systems can store and manage XML documents as instances of XML schemas. To store and manage the XML documents in a database system, database representations, defined in terms of data types, handled by the database system, referred to herein as database types, are used to represent XML documents. Database types include, for example, native database types, such as integer and VARCHAR (“variable length character string”), or object types defined for a database system using a DDL statements (data definition language statements.
For example, a database representation of an entire XML document may be a CLOB (binary large object), or maybe one or more tables whose columns store the components of a XML document in one or more rows. A database representation may be a hierarchy of objects in an object-relational database; each object is an instance of an object class and stores one or more elements of a XML document. The object class defines, for example, the structure corresponding to an element, and includes references or pointers to objects representing the immediate descendants of the element.
Representing Collections with Varrays
XML schemas often define a collection of elements (“collection element”) by specifying within a XML schema declaration of an element a maxOcccurs attribute with a value >1. Such a collection of elements is represented within a database using a VARRAY column in a table. Each array element in a VARRAY represents a member element in a collection of elements.
An object-relational database system stores a VARRAY column in several ways, referred to herein as forms of VARRAY storage. In one form of VARRAY storage, the inline form, the array elements of a VARRAY column of a table are stored inline within the table. In another form of VARRAY storage, the array elements of a VARRAY column of a table are stored out-of-line in another table (“out-of-line table”). Further details about VARRAYS and forms of VARRAY storage are described in Database Object Collections.
The decision about how to store VARRAYs may be based on various factors. Storing a VARRAY column out-of-line requires more space. However, the out-of-line table may be queried independently and may be indexed, allowing more efficient querying of VARRAYs. Furthermore, a database limitation (such as that for a VARRAY containing a CLOB) may require that a VARRAY column be stored out-of-line rather than inline. Indexing a VARRAY column may also require out-of-line storage.
Schema-level Determination of how to Store Varrays
In an approach for controlling how to store VARRAYs, the form of VARRAY storage is controlled at the XML schema level. All VARRAY column representations for a XML schema are either stored inline or out-of-line. Thus, if a database limitation requires out-of-line storage, or if it is desired to take advantage of some of the features of out-of-line storage, all the collection elements defined by a XML schema have to be stored out-of-line in a VARRAY column even though out-of-line storage is desired or required for only a subset of the collection elements defined by the XML schema. This limitation inflates the storage cost of handling collection elements and proliferates the number of tables needed by a database system to support collection elements.
Based on the foregoing, an improved way of determining how store VARRAYs for collection elements is desired.