The World Wide Web (WWW) involves a network of servers on the Internet, each of which is associated with one or more Hypertext Markup Language (HTML) pages. The HTML pages are transferred between clients that make requests of servers and the servers using the Hypertext Transfer Protocol (HTTP). Resources available from servers on the Internet are located using a Universal Resource Locator (URL). The standards and protocols of the WWW are promulgated by the World Wide Web Consortium (W3C) through its servers at www.w3c.org, and are used on many private networks in addition to their use on the Internet.
The HTML standard is one application of a more general markup language standard called the Standard Generalized Markup Language (SGML). Recently, a subset of SGML that is more powerful and flexible than HTML has been defined and has gained popularity for transferring information over the Internet and other networks. The new standard, developed and promoted by W3C, is called the eXtensible Markup Language (XML). XML provides a common syntax for expressing structure in data. Structured data refers to data that is tagged for its content, meaning, or use. XML provides an expansion of the tagging that is done in HTML, which focuses on format of presentation. XML tags identify XML elements and attributes of XML elements. XML elements can be nested to form hierarchies of elements. As used herein, the term “XML construct” includes structures of the XML standard, including XML documents, XML elements, XML attributes of XML elements, and fragments of XML documents made up of several XML elements at the root level.
XML documents are designed to transfer hierarchical data between client and server processes distributed over a network of heterogeneous computing devices and operating systems. An XML document is transferred as a series of bits that represent a string of characters. The characters in the string indicate tags that mark a component of the XML document, such as an element or attribute. The characters that follow a tag indicate values for the tagged element or attribute, if any. Different character sets can be used in different XML documents.
Relational databases predate, and developed independently of, the World Wide Web. Relational databases store data in various types of data containers that correspond to logical relationships within the data. As a consequence, relational databases support powerful search and update capabilities. Relational databases typically store data in tables of rows and columns where the values in all the columns of one row are related. For example, the values in one row of an employee table describe attributes of the same employee, such as her name, social security number, address, salary, telephone number and other information. Each attribute is stored in a different column. Some attributes, called collections, can have multiple entries. For example, the employee may be allowed to have multiple telephone numbers. Special structures are defined in some relational databases to store collections.
A relational database management system (DBMS) is a system that stores and retrieves data in a relational database. The relational DBMS processes requests to perform database functions such as creating and deleting tables, adding and deleting data in tables, and retrieving data from the tables in the database. A well-known standard language for expressing the database requests is the Structured Query Language (SQL).
Object-relational databases extend the power of relational databases. Object-relational databases allow the value in a column to be an object, which may include multiple other attributes. For example, the value in the address column may be an address object that itself has multiple attributes, such as a street address, a city, a state, a country, and a zip code or equivalent. An abstract data type (ADT), also called an object type, defines the attributes of an object in an object-relational database. SQL has been extended to allow the definition and use of objects and object types in object-relational databases. As used hereinafter, the term “object-relational database” refers to a subset of relational databases that support object-relational constructs; and an object-relational construct is one example of a relational construct.
Because of the popularity of XML as a data exchange format that supports hierarchical relationships among elements, and because of the power of relational DBMSs to update and retrieve data, there is a demand for generating XML data output from relational databases and storing XML data into relational databases. To support this demand, some DBMSs define relational database constructs and object-relational database constructs for storing data for XML documents. Often, different attributes or elements of one XML document are stored in different object-relational constructs, such as in different tables or different columns or different rows. The use of object-relational constructs allows the DBMS to support powerful data manipulation and retrieval operations that involve the data for the XML documents (which data is hereinafter called, simply, “XML data”).
Sometimes XML data stored in a relational DBMS is to be transferred from one component of a system to another, such as from a DBMS server to an application program that acts as a client of the DBMS server, or between two DBMS servers in a distributed DBMS, or from one process in the DBMS to another process in the DBMS.
In one approach, the XML data is extracted from one or more object-relational constructs in the DBMS and converted to an XML document that may be transferred over a homogeneous or heterogeneous network from a sending component to a receiving component. This approach takes advantage of the XML standard for such transfers. However, the XML standard is quite verbose, requiring many characters to specify tags and often requiring repetitive use of the same tags. Also an XML document is required to have a single root element with certain attributes that might not be necessary when it is desired to transfer a few particular XML elements.
A characteristic of the first approach includes converting data from object-relational constructs to an XML document. This conversion can consume considerable computational resources at the sending component. In addition, the XML document can consume considerable bandwidth on the communications channel to transmit the verbose document. Furthermore, the receiving component can consume considerable resources to parse the XML document and extract that portion of the document to be used.
A serial format for transferring XML data between components, which makes best use of the resources available to the components, depends upon the components and the shared and unshared resources available to the components. It would be desirable if, in some circumstances, XML data could be transferred between components in a more compact format than the verbose, standard XML document.
For example, if the receiving component uses object-relational constructs to store the received XML data, additional computational resources are used at the receiving component to convert the XML document back to object-relational constructs. In such circumstances, it would be preferable if XML data that is already distributed among one or more object-relational constructs at the first component were transferred in a format for object-relational constructs that can be used directly by the receiving component. The object-relational constructs are often less verbose, consuming less bandwidth during the transferal. Also fewer computational resources are consumed on both the sending and receiving components, because the XML data are not converted back and forth to a verbose XML document.
In some DBMSs, memory or persistent storage space, or both, are shared among some components, such as two processes of the DBMS on the same host computer. Transferring XML data in object-relational constructs between such components could be accomplished by transferring a reference to the proper location in shared memory or on storage for the constructs holding the XML data to be transferred, without converting back and forth to a verbose XML document.
Based on the foregoing, there is a clear need for techniques to allow multiple serial formats for transferring XML data between components of a system that uses a DBMS to store XML data and to select a format that makes better use of the resources available to the components.
The approaches described in this section could be pursued, but are not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated herein, the approaches described in this section are not to be considered prior art to the claims in this application merely due to the presence of these approaches in this background section.