1. Technical Field
The embodiments herein generally relate to computer database systems, and, more particularly, to query processes performed on computerized databases.
2. Description of the Related Art
Generally, in the realm of computer database systems a materialized query table (MQT) is a table that is defined by a query that is also used to determine the data in the table. Materialized query tables can be used to improve the performance of queries run on databases. In a partitioned database environment, one may replicate materialized query tables. One may also use replicated materialized query tables to improve query performance. The Materialized Query Table Advisor (MQTA) is generally used to recommend and create materialized query tables (MQTs) for a standalone database in order to improve response times of queries. In this configuration, MQTs are generally placed on the same server on which the base tables are located. Generally, MQTs are created to reduce the query processing cost by replacing parts of the query with existing and matched MQTs. However, the MQTA does not generally apply to a distributed environment where MQTs need to be recommended and placed on one or more remote servers to offload the traffic to the backend (i.e., remote) database. The frontend (i.e., local) database typically deploys a federated database. The federated database is able to utilize the matched MQTs at the frontend database, rewrite the query, forward the rewritten query to the backend database, and merge the results from the backend database and the MQTs.
FIG. 1(A) shows the operational flow of a MQTA. The process takes (101) a workload (i.e., read and write queries to the database system) and simulated catalog (i.e., database statistics and system catalog) as the input. The process first performs a workload compression to remove those insignificant queries, which are inexpensive or infrequent. The process then performs (103) multi-query optimization to derive common parts in the workload and generates candidate MQTs. In the next step, (105), the MQTA (1) calculates the benefits of these candidate MQTs (in terms of resource time reduction based on simulated catalog); (2) calculates overhead (in terms of resource time) for refreshing MQTs by incorporating database updates; (3) estimates the size of MQTs; (4) calculates the utility of each MQT by dividing net benefit (i.e., benefit minus overhead) by the size of the MQT and its indices size; and (5) recommends in step (107) the MQTs whose utility values are higher than a given threshold.
In the system configurations described herein, the frontend database may be embodied, for example, as a WebSphere™ II (formerly called DB2 II) database, available from International Business Machines (IBM), NY, USA, which is a local server where federated queries are submitted for processing and where the backend database includes remote servers that can be any database management system. Although the WebSphere™ II database is used herein as an example, those skilled in the art would readily acknowledge that other comparable databases could be used in the context of the embodiments herein. In a typical deployment, the frontend database is closer to the users or applications than the backend databases. Thus, the frontend database is established as a cache server to store MQTs to answer the whole or part of federated queries submitted to the frontend database (i.e., local server and cache server). Hereinafter, the terms local server, frontend, frontend database, and cache server are used interchangeably. Moreover, the terms remote server, backend, and backend database are used interchangeably.
The frontend database typically deploys a federated database, such as IBM's WebSphere™ Information Integrator (II). The federated database is able to utilize the matched MQTs at the frontend database, rewrite the query, forward the rewritten query to the backend database, and merge the results from the backend database and the MQTs. In FIG. 1(B), a system configuration is shown in which a frontend database 110 (for example, WebSphere™ II) is used as a cache server for a single backend database 112. In FIG. 1(C), a more complex system configuration is shown having one frontend database 114 and two backend databases 116, 118. In this configuration, the frontend database (for example, WebSphere ™ II) is used as both a cache server and a federated database 114 for two backend databases 116, 118. The workload to the system could be federated queries that involve multiple databases.
To summarize, the frontend database is used for the following purposes: caching MQTs; providing query optimization by evaluating if MQTs can be used to improve query response time; rewriting queries if cached MQTs can only answer part of the query; rewriting federated queries using nicknames to queries using table names in the remote servers; merging query results from the remote servers and cached MQTs (if MQTs are used); and a simulated environment used by a MQTA and a Data Placement Advisor (DPA). In this context, data placement may refer to caching and replication of data.
One generally straightforward way to apply a MQTA for the distributed environment is to run the MQTA at the backend database and replicate MQTs recommended by the MQTA to the frontend database or databases. However, this approach does not generally consider the MQT dependency in a distributed environment. FIGS. 2(A) through 2(C) illustrate the MQT dependency in a standalone database. The MQTA calculates the benefits of MQT1 and MQT2 as 100 and 70, respectively, when they are presented separately. These benefits are calculated by compiling the related query in the workload twice with and without the MQT presented. One only has to compile these queries which have structural matches with the MQTs.
When both MQT1 and MQT2 are presented, the benefit is 170, the summation of the benefits by MQT1 and MQT2. In the standalone database environment, the benefits of MQTs are independent to each other. However, this dependency does not hold true in the distributed environment where a frontend database is deployed close to users and/or applications for offloading traffic to the backend database.
FIGS. 3(A) through 3(C) demonstrate that the benefit of MQT1 and MQT2 calculated in the backend database by MQTA are dependent when they are placed at the frontend database. In the example shown in FIG. 3(A), when MQT1 is placed at the frontend database, part of the query needs to be sent to the backend database. As a result, the benefit of MQT1 is less than 100. Similarly, it is shown in FIG. 3(B) that MQT2 yields a benefit of 70 when it is placed on the backend database. However, a federated database optimizer may not use it even though there is a MQT match with the same query when MQT2 is placed at the frontend database. The reason could be that using MQT2 may result in a big block of intermediate results returned from the backend database to the frontend database. Typically, an II optimizer estimates using MQT2 will make query processing more expensive than not using MQT2. As a result, MQT2 generally yields no benefit when it is placed at the frontend database. In FIG. 3(C), both MQT1 and MQT2 are presented to the query optimizer; the II will utilize both MQTs and yield a benefit of 200 (i.e., greater than 170) since utilizing MQTs in this case yields reduction in both processing latency and network latency.
To summarize, current MQTAs are designed to recommend and create MQTs on the same database server where the base tables are located. By creating appropriate MQTs, a MQTA reduces the computation latency; however, it does not typically reduce network latency nor support load distribution since all query processing tasks are still performed at the database management system (DBMS). Accordingly, there remains a need for a new approach to utilize a MQTA in order to improve query processes in computer database systems.