Structured documents are documents which have nested structures. Documents written in Extensible Markup Language (XML) are structured documents. XML is quickly becoming the standard format for delivering information on the World Wide Web because it allows the user to design a customized markup language for many classes of structure documents. XML supports user-defined tabs for better description of nested document structures and associated semantics, and encourages separation of document contents from browser presentation.
As more and more businesses present and exchange data in XML documents, the challenge is to store, search, and retrieve these documents using the existing relational database systems. A relational database management system (RDBMS) is a database management system which uses relational techniques for storing and retrieving data. Relational databases are organized into tables, which consist of rows and columns of data. A database will typically have many tables and each table will typically have multiple rows and columns. The tables are typically stored “on disk,” i.e., on direct access storage devices (DASD), such as magnetic or optical disk drives for semi-permanent storage.
Some relational database systems store an XML document as a BLOB (Binary Large Objects) or map the XML data to rows and columns in one or more relational tables. Both of these approaches, however, have serious disadvantages. First, an XML document that is stored as a BLOB must be read and parsed before it can be queried, thereby making querying costly and time consuming. Second, the mapping process is burdensome and inefficient, especially for large XML documents, because mapping XML data to a relational database can result in a large number of columns with null values (which wastes space) or a large number of tables (which is inefficient). Furthermore, by storing an XML document in a relational database, the nested structure of the document is not preserved. Thus, parent-child(ren) relationships are difficult to reconstruct.
According, there is a need for an improved method and system for querying structured documents stored in their native formats within a database system. The method and system should be integrated (or capable of being integrated) with an existing database system in order to use the existing resources of the database system. The present invention addresses such a need.