1. Field of the Invention
The present invention relates to a method for the processing of queries to a database stored on a computer readable medium, said computer readable medium comprising a representation of relations between sections in tables in the database, said method comprising identification of tables which in combination with other tables contain a specified type of data.
More specifically, the invention relates to optimization of queries to a database. Optimization of queries typically comprises generation of a series of so-called semantically identical solutions, each of which gives rise to the same reply from a database. Of the semantically identical solutions (also called solution candidates) the query or queries are selected which are optimal in a given respect, e.g. optimal in the respect that the query can be answered most quickly.
2. Description of the Related Art
Generally, two main groups of database applications exist. A first group is denoted On-Line Transaction Processing or OLTP applications. The aim of these applications is to handle different aspects of transactions, that is: input, output, storage, and processing of individual transactions. A database supporting an OLTP application is typically tailored to provide a good performance in respect of providing fast input/output response times. Typically, this is obtained by a database that is strongly normalised, that is redundancies in the database are removed. OLTP applications are for instance used in accounting systems, inventory and production management systems for solving operational problems.
A second group is denoted On-Line Analytical Processing (OLAP) applications. The aim of these applications is to provide a response to a query only. A database supporting an OLAP application is typically tailored to provide a good performance in respect of providing fast response times to different types of requests involving multiple transactions. Thus, since requests typically involve different types of more or less time consuming calculations e.g. summations, differences, counts, averages, etc. on large data amounts, it is convenient to store pre-calculated data which can subsequently be retrieved for providing such a fast response to a request. Therefore such a database is special in that it comprises data that are redundant to a high degree. In practise this means that the same information can be requested in different ways, e.g. involving a different set of tables in a database. OLAP applications are for instance used in management information systems providing a fast overview of large amounts of data e.g. across different OLTP databases including external data sources e.g. on the Internet.
OLAP databases are often ordered in a so-called star or snow flake scheme wherein a central primary table denoted a Fact Table is related to a number of secondary tables denoted Dimension Tables. One Fact Table is related to many Dimension Tables. A Fact Table contains the data on which calculations are based on. Data in a Fact Table contain the most detailed information. Dimension Tables contain data upon which it is desired to group calculations. Data in a table resulting from calculations based upon other tables are denoted aggregated data.
Generally, it should be noted that requested data stored in one or more tables and/or databases can be distributed on different physical systems for instance distributed on systems connected via a local area network (LAN), a wide area network (WAN), and/or the Internet. Moreover, it is a general desire to avoid having to change in applications using data in underlying database(s) when and if the physical structure of database is changed, thus providing a certain degree of independence between applications and underlying databases.
Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on the Principles of Database Systems, xe2x80x98An Overview of Query Optimization in Relational Systemsxe2x80x99 by Chaudhuri, S.; US, New York, Jun. 1, 1998, pages 23-43, ISBN: 0-89791-996-3 discloses a method of processing and optimising a query to a relational database. The disclosed method involves identifying tables which contain data specified in the query and generating a plurality of queries, involving selection of primary tables from the identified tables and incrementally selection of tables from the identified tables that is related to the primary table.
U.S. Pat. No. 5,822,747 discloses a system and a method for the determination of an optimal plan for executing an SQL query. On the basis of a query, in the form of an expression, a plurality of plans is generated from which an optimal plan is selected. The plans are generated by the use of a set of rules. The rules comprise implementation rules and transformation rules. Implementation rules are used for obtaining plans, and transformation rules are used for obtaining equivalent plans. An optimal plan is selected from the equivalent plans as a query to a relational database.
This solution, however, is inexpedient, since basically a query has to be specified which indicates the tables from which data are to be obtained.
U.S. Pat. No. 5,600,829 discloses an OLTP database system that provides some type of independence between a query and a physical structure of each database table by captioning each database table with a partial query reflecting the contents of that table. Relevant tables for a particular query may be identified by piecing together the partial queries until the user query is fulfilled. Moreover, the database system may be integrated with an optimiser.
However, this system requires that the database supports the mentioned sub-queries which clearly limits the so-called database independence. This system is basically an OLTP system which does not take into account that data in an OLAP system are stored in a redundant way. Thus, the use of sub-queries will result in an exponential number of sub-queries to the number of tables. This in turn will make it a complicated process to fulfill the user query in an efficient way.
In accordance with the present invention these problems are solved when the method mentioned initially additionally comprises generating a query for at least some of the identified tables by selecting one primary table from the identified tables and by selecting a set of tables, each table in the set being related to the primary table via N relations, and repeating this, N being varied for each repetition, until the set of tables contains the specified data, and selecting an optimal one of the generated queries.
These problems are solved when the method mentioned initially additionally comprises generating a query for at least some of the identified tables by selecting one primary table from the identified tables and by selecting a set of tables, each table in the set being related to the primary table via N relations, and repeating this, N being varied for each repetition, until the set of tables contains the specified data, and selecting an optimal one of the generated queries.
Hereby solution candidates in a number linear to the number of tables are generated. By selecting a primary table as a type of origin or centre point for investigating solution candidates only relevant solutions candidates are found. Thereby, system performance is not ruined by irrelevant solution candidates.
An optimal query may e.g. be a query which gives rise to the shortest response time when data are to be obtained from the relation base, and/or a query which gives rise to the smallest load of a system which manages control and administration of the relational database.
Expedient embodiments of the method include selecting queries from the plurality of queries by selecting the queries in which the maximum number of elements of relations to be used in order to reach the primary table from an arbitrary table in the set of tables containing the specified data, is as small as possible for the plurality of queries.
The method may further include giving each of the queries selected from the plurality of queries a neural value which, for a given query, is updated on the basis of a measurement of a response time for the given query, and selecting one query from the selected queries on the basis of a probability of selecting the query given by the neural values.
According to a further embodiment, the method includes giving each section involved in a query a weight value which weights the neural values on the basis of the influence which the section has on the response time for queries.
In each of the foregoing embodiments, the primary table can be a fact table and the set of tables can be a set of dimension tables.
The invention also relates to an apparatus for the processing of queries to a database.
Moreover, the invention relates to a computer readable medium.