The invention relates generally to the field of software technology and programming and more particularly to the representation and persistent storage of XML documents (or other programming objects that can be represented as XML documents) within a relational database.
Recent advances in software technology and the Internet are responsible for the proliferation of Extensible Markup Language (XML) as a standard for data representation. XML is a technology standard endorsed by the World-Wide-Web Consortium (W3C). Unlike HTML (the Internet standard for displaying information in a browser), which is concerned with the display and format of information, XML is concerned mostly with the structure of the data contained within a document (or other structured data object). For instance, a standard HTML web page contains instructions (or tags) that may instruct a browser to display a heading in bold font. An XML page on the other hand contains tags that allow a specialized browser (or other software) to know where the name of an author may be found within the contents of a document.
XML data is stored in XML documents. An XML document includes 3 main parts: a prolog, a body and an epilog. The prolog and epilog are considered optional parts of the document.
The prolog may include one or more processing instructions, a document type declaration, and one or more comments. The body of the document includes exactly one XML element known as the document element. In one of its simplest forms, an XML element includes a start tag, an end tag and some data therebetween. The epilog of the document may contain processing instructions and comments. Since the prolog and epilog are optional, the body represents the main content of the document.
The simple data contained between start and end tags that is not an element, processing instruction or comment is sometimes called a text node. A text node can be almost any string of characters provided it does not contain characters that would be confused with XML markup. To properly embed an XML parsing instruction in the text node without violating XML rules would require a CDATA section. CDATA sections are special text nodes that tell a parser to ignore any XML parsing instructions it may encounter within a text node.
XML is quickly becoming a standard way in which businesses exchange information electronically. This is due to the fact that XML is evolving into a behind-the-scenes data format for business-to-business data exchange. As a result of this business-to-business data exchange, programmers have extended the usage of XML beyond simply representation of document data. Programmers now find themselves using XML as way to structure data records and complex programming objects.
One example of such general purpose programming usage of XML is in Object Oriented (OO) programming. In Object Oriented programming languages (such as Java, C++ and Smalltalk), programmers can represent data as complex objects including one or more attributes. For instance, a customer object may include a name, gender and date-of-birth. In some cases, objects are made up of smaller objects. An example might be when a customer object includes a shipping address object, which in turn includes a street, city, state and zip code. Such information might also be represented as an XML document.
While the above is helpful in an abstract sense, this alternative representation of the data object only really begins to benefit the programmer when one considers storing the information on a more permanent basis. In general, programming objects are thought of as objects that exist in memory. Their true usefulness is only truly manifest when the information that the objects represent are stored permanently on dynamically accessible magnetic (or some cases, optical) media for later processing, such as on a computer hard drive. This process of storing programming object information on a more permanent basis is often referred to as object persistence.
Conventional computer software uses a variety of techniques to accomplish object (as well as general data) persistence. One of the most common techniques is the use of relational database technology. In an abstract sense, relational databases store objects as inter-related tables of information that are described by rows and columns (much like a spreadsheet). However, the representation of objects can become very complex when considering a relational database structure. While the benefit of the relational data model is in the ability of programmers to query the database to retrieve information about an attribute of an object, the more complicated the objects and the interrelationships become, the more complicated the relational data model becomes. This can increase the time it takes to write and test software.
Accordingly, it would be advantageous to provide techniques that would rely on a general data model for storage that does not change as the object model changes. It would be a further advantage to provide such techniques that employ XML.
One of the problems with the persistence of XML data in a relational database is that XML data is hierarchical in nature. Hierarchical information is inherently recursive. If this information is stored in a relational database, one might have to employ a recursive query algorithm to get an entire document. Recursive querying on a database can be a resource intensive process. Therefore it would be advantageous if a general technique could be devised to represent object data as XML in a relational database in such a way that would not require recursive querying while still maintaining the benefits of a general (non-changing) data model that still maintains the structure of the individual document components.