Databases, in their various forms, have a long history within computer science. Early databases were files customized for use with specific applications. The application was responsible for the organization of the data in the file, for searching the file, and for updating the file as needed.
Eventually, generalized database applications came into being. These applications included interfaces that allowed other applications to use the databases without having to manage the data directly. The programmer could define the structure of the database, add data, search the database, and perform other functions without having to be responsible for the implementation of the database itself.
As database applications have come into existence, their development paths have diverged. Where once there was only one type of database, there are now flat databases, relational databases, object-oriented databases, and other varieties. But all of these database models share a common problem: they store data in a format specific to the database model. There is no functionality to support storing and manipulating data in a generic format.
The inability of current database models to support generic data is especially problematic when databases are used to store eXtensible Markup Language (XML) documents. XML is a generalization of HyperText Markup Language (HTML), the format of documents used in surfing the World Wide Web on the Internet. XML documents, often defined using XML schemas or Data Definition Types (DDT), can include their own tag definitions, whose significance is determined by the application processing the document. (For more information about XML document structure, the reader is referred to the web site of the World Wide Web Consortium, at http:##www.w3.org; specifically, the reader is referred to http:##www.w3.org#XML. (In the Uniform Resource Locators (URLs) above and below, the forward slash marks (“/”) have been replaced with pound signs (“#”) to avoid document scanning problems.)) For example, FIGS. 1A-1B show two different XML documents. XML document 105 is a document storing a purchase order for a lawnmower; XML document 110 is a document storing a quick note. (XML document 105 is adapted from an example found at http:##www.w3.org#TR#xmlschema-0# (Copyright®2001 World Wide Web Consortium, (Massachusetts Institute of Technology, Institut National de Recherche en Informatique et en Automatique, Keio University), All Rights Reserved.); XML document 110 is adapted from an example found at http:##www.w3 schools.com#xml#note.xml.) Note that the overall structures of XML documents 105 and 110 are similar, but the content (specifically, the tags used) have no similarity.
The reason that XML documents 105 and 110 are difficult to store in current database models is rooted in the adaptability that makes XML documents useful. The database can store XML documents in one of two ways. The database can store XML documents in the database native format, but doing so requires disassembling the XML document into constituent pieces that can be stored in the database native format, thereby destroying the value of the XML document format and codifying it in the rigid Data Definition Language (DDL) of the database, which is often very difficult to change in an enterprise. Or, the database can store the XML document as a field in a record, retaining the format of the XML document but sacrificing the value the database can add to the organization of the data.
To help make this problem clearer, consider XML document 110 in FIG. 1B. XML document 110 describes a note. The note has four parts: the note's recipient, the note's sender, a heading for the note, and the body of text in the note. If the database stores these four elements separately in a table, then the database has broken the XML document into parts, and lost the significance of the XML document as a whole. On the other hand, if the database stores the XML document as a single object, then the database loses its normal functionality with respect to data. For example, the database cannot search for XML documents (such as XML document 110) sent by Mary: conventional database searching depends on the database storing data in the database native format.
Another complication is the concept of the attribute in an XML document. Attributes add functionality to XML documents, without changing the structure of the tags in the document. For example, in XML document 105 in FIG. 1A, the tag “purchaseOrder” includes the attribute “orderDate.” Because databases currently do not handle attributes when processing XML documents stored in the database as a field, the databases ignore potentially critical information.
As can be seen from the above description, current data stores do not enable the utilization of database functionality (such as indexing for fast searching) while retaining the flexibility of a generic document. The invention addresses these problems and other in the art.