The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.
In a multi-tier system that includes an outer tier comprising application servers and a database tier comprising database servers, data access performance and data availability are important factors that determine the response time to user queries and the overall system performance. In some multi-tier systems, a middle tier (also referred to hereinafter as “mid-tier”) is provided between any outer tiers and the database tier. The mid-tier comprises servers that cache data managed by the database servers. The application servers in the outer tier typically send queries for data to the servers in the mid-tier instead of the database servers in the database tier, thus achieving high-speed access to the requested data.
In one example, a multi-tier system may be used in a large enterprise data-processing environment to provide faster access to the enterprise database servers. The multi-tier system may include a database tier comprising one or more Relational Database Management System (RDBMS) servers, an outer tier comprising one or more application servers, and a mid-tier comprising one or more servers and one or more relational caches. The mid-tier servers store in the relational caches, as relational tables, portions of the most-often accessed relational tables that are managed by the one or more RDBMS servers in one or more databases. The mid-tier servers may also provide synchronization and/or transactional replication mechanisms to propagate to the cached data any changes that are made to the original data in the underlying one or more databases. When an application server in the outer tier issues a query for data, instead of sending the query directly to an RDBMS server in the database tier, the application server sends the query to a server in the mid-tier. If the data is not found in a relational cache, then the mid-tier server sends the query to an RDBMS server for processing. If the data is found in a relational cache, which will be the case for the most-often accessed data, the mid-tier server responds to the query with data from the cache, thus shielding the RDBMS servers from the load associated with processing the query. In this way, application servers in the outer tier of the multi-tier system may open a large number of concurrent connections and may issue a large number of concurrent queries against enterprise database servers in the database tier without substantially increasing the load on the database servers.
Caching of often-accessed data in a mid-tier data cache is typically used in multi-tier systems in which the database tier comprises relational database servers that support Structured Query Language (SQL) query access to data stored in relational tables. However, the existing relational data caching frameworks do not provide mechanisms for efficiently processing of queries for XML data that may be stored in XML-enabled relational databases or in other type of XML-enabled storage, even though XML is widely accepted as a standard for describing and providing structure for a body of data, such as, for example, a file, a data stream, and a data packet.
Further, some XML-enabled RDBMS and Object-Relational Database Systems (ORDBMS) provide a native built-in datatype (referred to herein as a native XML datatype or just XML datatype) which allows users to store XML data natively in a database via the use of XML datatype tables or XML datatype columns. For example, some RDBMS support a built-in XML datatype, XMLType, for storing XML data natively in the database system. The XMLType datatype is based on the SQL/XML standard defined in INCITS/ISO/IEC 9075-14:2003, the entire contents of which are hereby incorporated by reference for all purposes as if fully set forth herein, and which is referred to hereinafter as the “SQL/XML 2003” standard. In some RDBMS, the XMLType datatype may further extend the SQL/XML 2003 standard by providing support for representing instances of data defined in the XQuery Data Model. The XQuery Data Model defines the permissible values of expressions in the XML Query Language (XQuery), the XML Path Language (XPath), and the extensible Style Language Transformation (XSLT). A draft specification for the XQuery Data Model is described in “XQuery 1.0 and XPath 2.0 Data Model”, W3C Candidate Recommendation 3 Nov. 2005, located at “http://www.w3.org/TR/xpath-datamodel/”, the entire contents of which are hereby incorporated by reference for all purposes as if fully set forth herein.
However, the existing relational data caching frameworks do not provide mechanisms for efficiently processing of queries for XML data that may be natively stored in XML-enabled relational databases as a built-in XML datatype. For example, in one approach a mid-tier relational cache may be used to cache data from one or more relational database tables that store XML data as a native XML datatype. According to this approach, a mid-tier server treats XML data of a native XML datatype as opaque data. The mid-tier server fetches, from a relational database into a mid-tier relational cache, the entire underlying relational tables that store the XML data. When a query requesting an instance of the XML data is received, the mid-tier server first materializes the XML data from the cached relational tables, and then runs any relevant SQL/XML, XQuery, XPath, or XSLT query processors over the materialized XML data.
The disadvantage of this approach is that it does not scale well. As the relational tables that store the XML data in the relational database grow larger, the relational cache in the mid-tier similarly grows larger. The reason for this is that in order to be able to materialize the XML data in response to a query, the mid-tier server needs to have in the mid-tier relational cache all of the relevant data from the underlying relational tables in the relational database. This, however, leads to losing the benefit of caching since the mid-tier relational cache needs to store essentially the entire relational tables that store the XML data as an XML datatype in the underlying relational database.
In addition, when the size of the XML data that is materialized in the mid-tier based on the mid-tier relational cache is large, the process of materializing the XML data and running the SQL/XML, XQuery, XPath, and/or XSLT processors over the materialized XML data becomes very resource intensive and computationally expensive. Further, unlike a relational database server that stores the XML data as a native XML datatype, a mid-tier server cannot avail itself of any XML indexes that may have been created in the database over the XML data stored in the relational database tables.
For example, in order to evaluate a query including an XPath operator for requesting an instance of XML data from a particular location in an XML document that is stored as a native XML datatype in a relational database, the mid-tier server needs to first materialize the entire XML document from the tables in the mid-tier relational cache. Then, the mid-tier server needs to build a Document Object Model (DOM) tree of the XML document in order to be able to execute the XPath operator and to determine the exact location of the requested XML data instance. In contrast, a relational database server that stores the XML document as a native XML datatype may have built an XML index over the XML document and, in response to the same query, may use the index to locate and retrieve the requested XML data instance without building a DOM tree. Thus, the existing approach for processing queries for XML data against a mid-tier relational cache is not only resource intensive and computationally expensive, but in some cases may even perform slower than if no data caching in the mid-tier were used at all.
Based on the foregoing, techniques for efficiently processing of queries for XML data by using mid-tier data caching framework are clearly needed. Further, there is also a clear need for techniques for processing of queries for XML data that may be stored in XML-enabled databases as a native XML datatype.