Embodiments of the present invention relate to storing and accessing information in a structured document in a computer implemented database system.
Structured documents have nested structures, i.e., structures that define hierarchical relationships between components of a document. Documents written in Extensible Markup Language (XML) are structured documents. XML is quickly becoming the standard format language for exchanging information over the Internet because it allows the user to design a customized markup language for many classes of structured documents. For example, a business can easily model a complex structure of a document, such as a purchase order, in a form written in XML and send the form for further processing to its business partners. XML supports user-defined tags for better description of nested document structures and associated semantics, and encourages the separation of document content from browser presentation.
As more and more business applications use structured documents written in XML to present and exchange data over the Internet, the challenge is to store, search, and retrieve these documents using existing relational database systems. A relational database management system (RDBMS) is a database management system which uses relational techniques for storing and retrieving data. Relational databases are organized into tables, which consist of rows and columns of data. A database will typically have many tables and each table will typically have multiple rows and columns. The tables are typically stored on direct access storage devices (DASD), such as magnetic or optical disk drives, for semi-permanent storage.
Some relational database systems store an XML document as a Binary Large Object (BLOB). While storing the document as a BLOB is straightforward, accessing the data in the document presents challenges because a BLOB is not easily queried. For example, each BLOB must be read and parsed before it can be queried. For a large number of BLOBs, this process can be prohibitively costly and time consuming.
Other relational database systems store an XML document by mapping the XML data to rows and columns in one or more relational tables. This approach, however, introduces inefficiencies, especially for large XML documents. For example, mapping an XML document to a relational database can result in a large table having numerous columns with null values (which consumes valuable memory) or a large number of tables (which is inefficient and consumes memory).
Moreover, neither approach provides a way to store information related to the document data, e.g., metadata, that does not introduce significant inefficiencies and require significant resources. Furthermore, neither approach preserves the nested structure of the document. Thus, parent-child(ren) relationships are difficult to reconstruct.
Accordingly, it is desirable to be able to store structured documents in their native formats within a database system. It is also desirable to associate related information with components of structured documents. It is also desirable to integrate structured documents into an existing database system in order to use the existing resources of a database system.