1. Field of the Invention
The present invention relates to specifying indexes in a database in response to an analysis of received enquiries.
2. Description of the Related Art
Data processing environments are known in which executable instructions are arranged to generate a data set derived from data contained within a database in response to a data enquiry. Data may be accessed directly from data tables, where it is necessary to search the entries within a table in order to obtain the information required. As an alternative, indexes may be used, thereby avoiding the need for a time-consuming search, but increasing the amount of processing required whenever data is modified. Indexes also require additional data storage capacity.
Designing index structures for large and heavily used databases is presently an extremely difficult task and highly susceptible to the introduction of errors. This problem exists because the technical demands and constraints are such that it is not possible for a database administrator to simultaneously perceive the indexing requirements for hundreds or thousands of different structured query language (SQL) statements, that run against the database on a day-to-day basis. These requirements must then be converted into a preferred set of indexes defined over the whole database. A poorly specified set of indexes will result in SQL statements that consume far too much of the processor facility, that run for far longer than is theoretically necessary and result in a data processing environment that is heavily overloaded.
For a long time, there has been a requirement for procedures that globally specify index structures defined over a given database design, for a typical SQL workload, which may be referred to as a target workload. However, this technical problem has persisted given the inherent difficulties of realising a technical solution.
An earlier proposal of the present inventor is described in international patent publication number WO 97/22939, describing a procedure for specifying an index set by analysing a sample of SQL statements applied to the database. In this proposal, indexes are identified that could assist in the execution of the analysed statements and levels of improved operation are evaluated for each of the indexes. The evaluated levels are processed to specify an index set for inclusion in the database and storage is allocated, within an index being selected, on a prioritised basis.
A problem with the proposal identified in said earlier publication is that the size of the set of candidate indexes that must be evaluated to arrive at a preferred index set for the target workload becomes relatively large. In addition, the cost of developing a set of preferred indexes for each table using a genetic algorithm is also large. In combination, these problems result in a significant processor overhead requirement. Thus, the elapsed time taken to arrive at a preferred index set is prohibitively long, because there are so many candidate indexes. Furthermore, because of the genetic algorithm used in the earlier proposal, at least several hundred generations are required in order to arrive at a suitable set of preferred indexes for a particular table.
The present invention provides a method of identifying preferred indexes for a database, wherein the database has received queries from users. A sample of user queries are analysed to identify candidate indexes. Index permutations are identified by processing candidate indexes in combinations identified from the sample of user queries. A preferred index set is then identified by comparing the performance of the index permutations.
Preferably, preferred indexes are identified for a database in a way such that earlier permutations of chosen indexes progressively guide and restrict the search path of the process. This in turn reduces processing time by many orders of magnitude when compared to known systems, without compromising the set of indexes identified.
The current invention uses techniques that dramatically reduce the search space for candidate indexes that need to be evaluated. In doing so, it also builds a set of preferred indexes for each table as part of its candidate index evaluation processing. This means that devising a set of preferred indexes for each table can complete many orders of magnitude faster than previously known systems. The current invention also by its nature builds a set of preferred indexes for each table as processing progresses, thus avoiding the expensive time and resource consuming need for known systems, such as the use of genetic algorithms, to devise a preferred index set for each table.
The invention also provides for apparatus for identifying preferred indexes, exploiting the aforesaid method and for a computer readable medium having computer readable instructions executable by a computer so as to exploit the aforesaid method.