1. Field of the Invention
The present invention relates to apparatus for creating summaries of a database, and in particular to apparatus for creating summary tables used to increase the speed of access to information contained within a database.
2. Description of the Prior Art
The majority of databases currently available consist of a data store in which the data is divided up into a number of tables. Joins are created between the tables to allow data stored in different tables to be linked.
Database are usually provided in a normalized form so that there is no redundancy in the data, thereby ensuring that the database uses the minimum amount of storage space necessary. This is achieved by arranging the tables such that data is not repeated in multiple tables.
However, this form of storage has implications regarding searching for data in the database. Generally, searching is achieved by submitting a query in a structured query language such as SQL. The database receives the SQL query and determines from the query what information is required. This information is searched for within the tables and output as a response. The time taken from receiving the query to outputting a result as known as the query execution time.
As will be appreciated by a person skilled in the art, the size of the tables vastly effects how long the searches take. Accordingly, if a large amount of data is present in the table, it may take a significant amount of time to locate the data in the table and generate an output.
In order to reduce the query execution time for regularly run queries, it is known to provide a series of summary tables. These summary tables effectively summarize data already contained within the tables by maintaining a separate copy of the data in a summary table. The summary tables, which are generally smaller than the tables of the database itself are designed such that the data required to answer regularly run queries is more readily available. Thus for example, if one particular query is run daily it is possible to create a summary table including only the data required to answer that query and no other data. When this query is next received, the query is rewritten so that the data is obtained directly from the summary table instead of the usual detail tables. This ensures that the query is answered in as short a time as possible.
However, the summary tables can currently only be generated manually. Accordingly, this requires a system administrator to monitor usage of the database, for example by monitoring the queries submitted to the database and then determining which summary tables would provide the greatest improvement to the database operating efficiency. In particular, there will typically only be a limited amount of space in which summary tables can be provided and as these provide redundant data, it is preferable to only provide summary tables if these will result in a noticeable improvement in the query execution time.
However, it is extremely difficult for a system administrator to accurately judge which summary tables would provide the optimum increase in the query execution time for the entire database. Furthermore, the system administrators have to construct and maintain the tables themselves, which can require monitoring of the database usage on a daily basis. This is time consuming and an extremely difficult procedure.
In accordance with the present invention, we provide apparatus for creating summaries of a database, the database being responsive to a query to provide data, the apparatus comprising a processor coupled to a database, the processor being adapted to:
a. monitor the usage of the database; and,
b. determine one or more summaries in accordance with the usage.
Accordingly, the present invention provides apparatus for automatically determining summaries that should be used by the database. The apparatus achieves this by monitoring the usage of the database and then using this information to determine one or more summaries which may be used. This allows summaries to be created without requiring a systems administrator having to make predictions about the usage of the database. As a result, any subsequent queries which require data which is contained in one of the summaries will have a query execution time which is far lower than when the summary is not present.
Typically the system monitors the usage of the database by monitoring the number of queries made and the length of time taken for the database to generate a response to each query. However, alternatively, the usage may be determined on an access cost basis with different forms of query having a different associated cost.
The one or more summaries are preferably determined by causing the processor to propose a number of summaries, determine the improvement achieved by using the proposed summaries and, select one or more of the summaries in accordance with the improvement achieved using the summary. Thus, this ensures that summaries which result in the most benefit are used.
The proposed summaries typically correspond to data provided by the database in response to a received query. Such summaries are perfect match summaries. However, alternatively imperfect match summaries may be used in certain circumstances. The use of imperfect match summaries allows responses to be generated solely from a summary table for non-identical queries. Such imperfect match summaries are typically created by combining or generalizing existing summaries.
The processor also usually proposes summaries in accordance with a number of defined proposal parameters, such as an indication of the nature of the queries submitted to the database. Thus, whether perfect or imperfect match summaries are used can depend on the nature os queries submitted to the database. Alternative factors, such as the amount of space available for summary storage may also be taken into account. This allows certain queries to be more influential in the generation of summary tables, so that, for example, it can be ensured that summaries are created for certain queries.
However, alternative methods may be used. Thus, for example, the system may not have any space limitations in which case, it will generally simply propose a summary and then implement it as the use of any summaries will result in an improvement in the overall query execution time of the database. Alternatively, the system administrator may desire to use certain summaries and these can therefore be designated as predetermined summaries which the apparatus is not able to alter.
The processor is typically adapted to determine the improvements achieved in accordance with the length of the time taken for the database to generate a response to each query with a summary present, the length of time taken for the database to generate a response to each query with no summary present and, the number of times the given query is submitted to the database in a given time period. This advantageously allows the apparatus to predict the total query execution time improvement that will be obtained for the database over a given time period, assuming that typical queries are submitted. Thus, for example, the apparatus can work out the total reduction in the query execution time that would be achieved for a given query for an entire day, given the number of times the query would typically be submitted in one day. This value can then be summed for a range of queries to determine an overall query execution time reduction. However, any suitable method of judging improvement, such as determining a reduction in the number of operations performed by the processor, or a reduction in the cost access requirements, may alternatively be used.
One possibility is for the length of time taken for the database to respond to a query with a summary present to be estimated in accordance with the size of the summary and a predetermined algorithm. This allows the processor to use the size of the proposed summary to estimate the query execution time for a query which only accesses the given summary. This allows the processor to determine the improvement in the query execution time that is obtained even if the summary is never produced. Of course, if the summary is already present and is currently in use, then the estimate is not required.
The processor is typically coupled to a store and may also be further adapted to generate selected one(s) of the one or more summaries, using data from the database, the summaries being stored in the store. In this case the system can be configured to automatically construct the summaries, without human intervention, allowing the system can therefore be left operating in the absence of a system administrator. Alternatively however the selection may be performed by a system administrator allowing specific summaries to be chosen and allowing the administrator to monitor the summaries used.
Typically the processor is adapted to select the summaries in accordance with a number of selection parameters, such as the size of the store. This allows the system administrator to specify the total size that the summaries should be limited to by limiting the size of the store. As will be appreciated by a person skilled in the art, the larger the number of summaries that are produced, the greater the improvement in the query execution time of the database. However, the more summaries are produced, the more redundancy there is in the database.
However alternative selection parameters may be used allowing a system administrator to specify summaries for creation and/or deletion and these can be set to ensure that they are not altered by the processor in accordance with the set selection parameters. Again, this allows certain queries to be more influential in the generation of summary tables, so that, for example, it can be ensured that summaries are created for certain queries. Additionally, some summary tables, once defined, may be fixed so that they are unalterable, whereas others may be alterable only after certain periods of time, or the like.
The processor is usually adapted to repeatedly modify the summaries in accordance with the usage at a predetermined time interval. This allows the system to ensure that the summaries are constantly updated so that redundant summaries are not maintained and to ensure the best query execution time is obtained even if the usage of the database has changed significantly over time. As this process is automatic and does not require any substantial input from a system user or system administrator the predetermined time interval may be as little as a few hours. Typically however the predetermined time interval will be either 24 hours or 1 week allowing the process to be carried out when the database is not heavily used.
It will be appreciated that as well as creating new summaries by repeating steps (a) and (b) to determine one or more view summaries and then generating selected ones of the one or more new summaries, it is also possible to delete summaries, or alternatively modify currently existing summaries to provide improved summaries.