1. The Field of the Invention
The present invention relates to database systems and more specifically, to manners of utilizing summary tables in a distributed database environment.
2. The Relevant Technology
A summary table, otherwise known as a materialized view, is a pre-computed and stored result of a query. A standard view is a derived relation defined in terms of base relations. A view defines a function mapped from a set of base tables to a derived table. This function is typically recomputed every time the view is referenced. A view becomes a materialized view when the tuples generated by the view are stored in a database. Index structures can be built on the materialized view. Consequently, database accesses to the materialized view can be much faster than recomputing the view. A materialized view or summary table is thus like a cache, essentially embodying a copy of selected data that can be accessed quickly.
The data stored in a database is generally accessed using a query formatted in the SQL language native to most existing database systems. It is a primary objective in designing database systems to be able to service queries with the least cost, that is, in the lowest amount of time. One manner of decreasing query response time is with the use of summary tables. It is often the case that certain data can be accessed more quickly by accessing one or more summary tables in which copies of the data have been stored. Summary tables are generally quite small relative to the entire database, and scanning a summary table is much more efficient than scanning multiple relations of a database. Thus, one technique for speeding up query servicing is to maintain a plurality of summary tables and to selectively direct queries to the appropriate summary table for which the query can be most rapidly serviced. It would be advantageous to employ summary tables directed to the various heterogeneous database systems in query optimization.
As a further complication to the employment of summary tables in multiple database systems (MDBS), some database systems do not support summary tables. In order to provide more efficient query servicing, it is desirable to utilize summary tables in such an environment.
Additionally, it is considered by the inventors to be advantageous in some instances to be able to support the summary tables locally within the respective databases of an MDBS, rather than within a centralized database management program. The benefits of so doing include better performance for a class of queries that involve computation on large amounts of data but that yields relatively smaller result sets; the possibility for using the remote source""s replication utilities for maintaining the currency of the summary table; and the enablement of the remote database""s applications to take advantage of the summary table.
It would be further advantageous if a single central program could be employed to support and manage summary tables on a plurality of heterogeneous database systems, including where necessary, on databases that do not natively support summary tables.
Accordingly, a need exists for a distributed database system that capitalizes on the use of summary tables. To best utilize such a system, the capability should be provided to centrally generate, maintain, and query the summary tables. The system preferably makes provisions for communicating with database systems that do not support summary tables. Such a distributed database system and its method of use are disclosed herein.
The apparatus of the present invention has been developed in response to the present state of the art, and in particular, in response to the problems and needs in the art that have not yet been fully solved by currently available database management systems. Thus, it is an overall objective of the present invention to provide a distributed database apparatus, system, and article of manufacture that capitalizes on summary tables within a plurality of heterogeneous database tables.
To achieve the foregoing object, and in accordance with the invention as embodied and broadly described herein in the preferred embodiment, a distributed database system and method are provided. The distributed database system is preferably implemented with modules for execution by a processor. In one embodiment, the modules comprise a central program comprising a communication module configured to communicate with a plurality of databases and an identification module configured to identify as a summary table a relation containing therein summary data from the plurality of databases.
At least one of the databases may be remote to the central program, and the databases may be heterogeneous. Each database may comprise a database system having database storage and a database manager. In one embodiment, at least one of the database systems does not directly support summary tables.
In one embodiment, the identification module comprises a single catalog resident in the central program. A nickname corresponding to the relation containing therein summary data from one or more of the plurality of heterogeneous database systems may be stored in the catalog. A description of the relation containing therein summary data as a summary table may also be stored in the catalog in conjunction with the nickname.
The central program may also comprise a summary table creation module within the central program, the summary table creation module configured to initiate the generation of a relation containing therein summary data from one or more of the plurality of databases.
The summary table may be created within the central program or, alternatively, remotely within the database system. In embodiments where the database system does not support summary tables, the database system is unaware that a relation containing therein summary data is a summary table.
The central program preferably comprises a query processing module configured to receive a user query and process the query for submission to one or more of the database systems. In one embodiment, the query processing module, generates an optimized query plan, considering in so doing, a cost model and the contents of one or more summary tables within one or more of the database systems.
An attendant method of use of the distributed database system in one embodiment comprises communicating with a plurality of databases from a central location and identifying as a summary table a relation containing therein summary data from the plurality of heterogeneous database systems. The step of communicating with a plurality of databases from a central location may comprise communicating with at least one database remote to the central program.
In one embodiment, the step of identifying as a summary table a relation containing therein summary data from the plurality of databases further comprises placing information indicative of the relation containing therein summary data within a single catalog resident in the central program.
The step of placing information indicative of the relation containing therein summary data within a single catalog resident in the central program may comprise placing a nickname corresponding to the relation containing therein summary data from one or more of the plurality of databases in the catalog. Additionally, placing information indicative of the relation containing therein summary data within a single catalog resident in the central program may comprise placing a description of the relation containing therein summary data as a summary table in the catalog and linking the description with the nickname.
The step of communicating with a plurality of databases from a central location may comprise communicating with a plurality of database management systems, one of the database management systems local to each of the heterogeneous database systems, at least one of the database management systems not directly supporting summary tables.
The method may further comprise communicating with a plurality of database systems, in which at least one of the database management systems does not directly support summary tables. This may comprise creating a relation containing therein summary data within the database management system without informing the database management system that the relation containing therein summary data is a summary table.
The method may further comprise initiating the generation of a relation containing therein summary data from one or more of the plurality of heterogeneous database systems. The relation may be generated within the central program or within the one or more of the plurality of heterogeneous database systems.
The method may further comprise receiving a user query and in response, considering the usage of one or more summary tables within one or more of the heterogeneous database systems in generating an optimized query plan based upon the user query.
In yet another aspect of the invention, an article of manufacture comprises a program storage medium readable by a processor and embodying one or more instructions executable by the processor to perform the above-described method of use of a distributed database system.
These and other objects, features, and advantages of the present invention will become more fully apparent from the following description and appended claims, or may be learned by the practice of the invention as set forth hereinafter.