1. The Field of the Invention
The present invention relates to querying databases. More specifically, the present invention relates to manners of optimizing queries in single or multiple database systems in which partially or fully replicated data exist.
2. The Relevent Technology
Two scenarios frequently occur in modem database usage. In both scenarios, multiple tables or xe2x80x9crelationsxe2x80x9d within a single database or within different databases may exist and be available to an entity or organization. The tables in these particular instances contain partially or fully replicated information. That is, the information or xe2x80x9cdataxe2x80x9d in the different tables may be equivalent. Additionally, the tables exhibit heterogeneous formatting, or xe2x80x9cschemas.xe2x80x9d That is, the information within the tables may be organized into different combinations of relations, rows, and columns, possibly with different relation names, row names, and columns. Such tables are referred to herein as being xe2x80x9crestructuring viewsxe2x80x9d of each other.
In a first scenario, the tables are naturally occurring. That is, the different tables exist for independent purposes. For instance, separate departments of an organization may individually maintain their own databases or tables within a common database, but may populate the tables or database with information from a common source. Such tables may be available to over a local network. Additionally, different databases that exhibit replicated information and heterogeneous schemas may exist at remote locations within organizations or may be maintained by different organizations and be commonly available over large scale networks such as the Internet.
In a second scenario, the tables are replicated for research purposes. A first table or xe2x80x9cbase tablexe2x80x9d is generally a naturally occurring database. The other tables are generally replicated from the first table as subsets of the first table. The replications often take the form of views. A View is a mechanism employed by the SQL language of which most databases are constructed that acts as a filter, showing only a portion of the data in the table to the user. Views, as abbreviated forms of the tables, can be searched more quickly than the full table or set of tables. A view can be created every time it is referenced, or it can be xe2x80x9cmaterializedxe2x80x9d and exist in a permanent or semi-permanent form.
Generally, when databases are replicated, as in our second scenario, they maintained through the use of materialized views. One such multiple database system (MDBS) in which materialized views are used for research and complex querying is known as a data warehouse. Various tools for managing such data warehouses exist, one example of which is IBM""s DataJoiner(copyright) product.
It is a primary objective in designing database systems to expedite query servicing by optimizing the query system. The use of materialized views is one manner in which the art has approached query optimization. It is often the case that certain materialized views can be more efficiently accessed for certain types of queries while others are more efficient for other types of queries. Thus, one technique for speeding up query servicing is to maintain a plurality of materialized views and to selectively direct queries to the appropriate materialized view for which the query can be most rapidly serviced.
A further development in the art of MDBS management is the addition of certain management tools to the SQL language. One such tool is SchemaSQL. SchemaSQL is a proposed extension to the SQL language that promotes efficient manipulation and classification of materialized views. For instance, SchemaSQL provides xe2x80x9cview definitions,xe2x80x9d which allow one materialized view to be mapped to another.
Conventional management and querying of views presumes that the views exhibit a common schema. Nevertheless, as discussed above, many naturally occurring multiple database systems include databases having heterogeneous schemas. It would be advantageous to employ the replicated tables in query optimization. Additionally, it has been predicted by the inventors that tables with replicated data and heterogeneous schemas could be used to further improve query optimization in data warehousing applications.
Accordingly, a need exists for a query optimization system that is compatible with and which capitalizes on the presence of databases that are restructuring views of each other. Such a query optimization system, to be most advantageous, should be easily implemented with existing technology and noninvasive to the MDBS on which it is intended to operate. Such a query optimization system and its method of use are disclosed herein.
The apparatus of the present invention has been developed in response to the present state of the art, and in particular, in response to the problems and needs in the art that have not yet been fully solved by currently available database management systems. Thus, it is an overall objective of the present invention to provide a query optimization system and method for a database management system that capitalizes on the presence of multiple tables that are restructuring views of each other.
To achieve the foregoing object, and in accordance with the invention as embodied and broadly described herein in the preferred embodiment, a query optimization system and method are provided. The query optimization system preferably is implemented with modules for execution by a processor. In one embodiment, the modules comprise a schema mapping module executable on the processor to express a schema mapping between a plurality of database tables with heterogeneous schemas and at least partially replicated information. The modules preferably also include a middleware module adapted to employ a middleware schema for use as a reference in expressing the schema mapping.
The query optimization system is preferably compatible with existing technology that optimizes queries by posing them against materialized views. Additionally, the query optimization system of the present invention may be adapted for use on a multiple database system (MDBS) comprising databases that are restructuring views of each other. That is, the databases preferably contain data that is partially or fully replicated among the databases, but exhibit heterogeneous schemas.
In one embodiment, the middleware module comprises a canonical schema module that constructs a virtual canonical schema. The virtual canonical schema is used to express a schema mapping between different restructuring views. The schema mapping may, in one embodiment, comprise an SQL view definition.
A plurality of operators may also be employed by the schema mapping module in expressing the schema mapping. In one embodiment, each operator represents a particular type of manipulation for transforming base tables into restructuring views and for expressing the transformations. Preferably, the plurality of operators are adapted for noninvasive use with existing databases. For instance, it is preferable that the operators perform operations written in the SchemaSQL language or a similar SQL compatible language or syntax.
In one embodiment, the operators comprise a fold operator, an unfold operator, a split operator, and a unite operator.
The query optimization system may also comprise a schema restructuring module executable on a processor to restructure a database relation into a restructuring view of the database relation in conjunction with the schema mapping module and the middleware module.
A query processing module may also be included and may serve as a query translation module. The query processing module is preferably executable on the processor to communicate with the schema mapping module and translate a received query executable on one of the plurality of heterogeneous database tables to a substantially equivalent query executable on another of the plurality of heterogeneous database tables. In one embodiment, the query processing module comprises a restructuring view to canonical query conversion module (or merely canonical query processing module) executable on the processor to translate the received query into a canonical schema query adapted as a query on a canonical table.
The query processing module preferably comprises a canonical query to restructuring view conversion module (or merely restructuring view translation module) executable on the processor to translate a canonical schema query into a query on one or more of the restructuring views.
The query processing module may also comprise a canonical map table generation module executable on a processor to generate a canonical map table. In one embodiment, the canonical map table comprises a portion of the schema mapping and is used in the query conversion operations as well as by a query optimization module. Under the present invention, a second map table, a restructuring views map table, is preferably employed, and accordingly, the query processing module may comprise a restructuring view map table generation module executable on the processor to generate the restructuring view map table.
The query optimization system may also comprise a query optimization module executable on a processor to receive a plurality of substantially equivalent queries generated by the query processing module together with the canonical map table and the restructuring views map table. In response, the query optimization module preferably consults and utilizes the plurality of substantially equivalent queries to generate an optimized query plan, executable at a least cost on one or more of the base table and restructuring views.
In one embodiment, the query processing module is adapted to provide the query optimization module with at least two of a base table query, a materialized view query, and a restructuring view query. Under this embodiment, the query optimization module is adapted to consider each of these queries in generating an optimized query plan executable on the plurality of heterogeneous database tables at a least cost.
An attendant method of use of the query optimization system is one embodiment comprises expressing a schema mapping between a plurality of databases containing at least partially replicated information and referencing a middleware schema in exressing the schema mapping. As described above, the middleware schema preferably comprises a virtual middleware table. In one embodiment, the virtual middleware table comprises a canonical table.
As also described above, the plurality of database tables may comprise restructuring views of each other, and as such, may be heterogeneous database tables exhibiting partially or fully replicated data.
In expressing the schema mapping, the plurality of operators may be employed, each operator representing a particular type of transformation between heterogeneous schemas. It is preferred that the plurality of operators are adapted for noninvasive use with existing databases. As described, the plurality of operators preferably includes a fold operator, an unfold operator, a split operator, and a unite operator. The operators may be employed within an SQL view definition expressing the schema mapping. The schema mapping and the operators may be employed in restructuring a database relation into a restructuring view of the database relation.
The method of the present invention may further involve automatically translating a query executable on one of the plurality of databases to equivalent queries on others of the plurality of databases and automatically selecting from among the equivalent queries a query corresponding to a selected criterion.
Other optional steps may comprise translating a received query executable on one of the plurality of heterogeneous database tables to a substantially equivalent query executable on another of the plurality of heterogeneous database tables using the schema mapping. In so doing, the received query may also be translated into a query on a base table. In additional steps, the query on the base table may be converted to a query on the canonical schema and the canonical schema query may be translated into a query on one or more of the heterogeneous database tables.
The method may also comprise receiving a plurality of substantially equivalent queries generated by the query processing module and in response generating an optimized query plan executable on the plurality of heterogeneous database tables at a least cost. In conducting the conversions, a canonical map table may be generated and may be accompanied by a restructuring view map table.
Once the plurality of alternate queries are generated, the method may involve generating with the use of the substantially equivalent queries an optimized query plan executable on the plurality of heterogeneous database tables at a least cost. In so doing, queries on a base table, on a materialized view, and on a restructuring view may be considered in the generation of the optimized query plan.
These and other objects, features, and advantages of the present invention will become more fully apparent from the following description and appended claims, or may be learned by the practice of the invention as set forth hereinafter.