This invention relates generally to the field of computers, and in particular to the use of indices for database queries with comparisons on a function.
A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. The following notice applies to the software and data as described below and in the drawing hereto: Copyright (copyright) 2000, Microsoft Corporation, All Rights Reserved.
Relational data bases are collections of rows or tuples of data. Each row may have one or more columns containing information, such as numbers, names, addresses, etc. For example, a column might contain the names of people that pay taxes, with the address, social security number and other information contained in other columns. All the information in a row is related to the same person. A query can be written, requesting information from the database. One such query might be related to age. The query could be related to finding the average income for all people over a certain age.
Indices are commonly used to find rows satisfying a given condition. In the above example, an index might be based on age. Such an index would provide a sorted list of rows in ascending age. The index would allow a database to execute the query more quickly by only looking at a group of rows where age is greater than a certain age. It would allow the database to skip the rows that do not meet the age criteria.
The use of indices is a very powerful query execution technique, which can dramatically improve execution time and effort. The index provides some lookup structure that allows direct access to a row, given a specific column value, instead of having to examine each and every row of a given table, referred to as a scan of the entire table. An index on a column  less than col greater than  can be use to find rows that satisfy conditions of the form  less than col greater than   less than cmp greater than   less than expr greater than . For example, an index on column T.a can be used to retrieve rows satisfying conditions T.a=5, or T.a greater than 10.
The same index can be used to execute table joins, when the condition is of the form T.a=R.b. In this case, for each row of R, a specific value for T.b is determined, and all rows of T having such value are found through the index.
When a comparison is not made directly on a column, but on some computation, indices on columns are in general not applicable, and commercial products are typically unable to speed up execution by means of the index. For example, an index on column T.a is usually not exploited to locate rows satisfying a condition T.a*T.a greater than 25. Note that values greater than 5.0, and also lower than xe2x88x925.0 satisfy this condition. In general, for comparisons of the form f( less than col greater than )  less than cmp greater than   less than expr greater than , (a function using the value of the column, and comparing the result with another expression) the index on  less than col greater than  cannot be used to locate qualifying rows.
Type conversion is one common example of a function where indices are normally not used. Converting from floating point to integer is one example of type conversion. xe2x80x98CONVERT(column, type)xe2x80x99 is used to denote type conversion in the rest of the document. There are different ways to end up with queries that contain CONVERT( less than col greater than )  less than cmp greater than   less than expr greater than . The condition could be written directly in such form by a user, or an implicit convert could be introduced by the system when operating on comparable types, such as integer and float. Another way to introduce this kind of comparisons is through inference from other conditions.
An index is used for a query that contains an original condition having a comparison based on a function over a column. An implied condition over the column is first identified and applied to restrict the rows to fetch from the index. Finally, the query is executed over the restricted set of rows using the original condition.
The implied condition identifies at least one bound for inequalities, and an upper bound and a lower bound for equalities and non-linear functions. The bound or bounds are calculated to be inclusive of all potential rows meeting the original condition, and to include as few rows not meeting the original condition as possible.
Implied conditions are fully determined at execution time once parameter values are known. A dynamic interval provides a flag indicating what bound to apply, lower bound, upper bound, or else special intervals such as all-rows-qualify, no-rows-qualify, or non-nulls-qualify. If an exception is encountered in the computation of implied bounding conditions, it is always safe to use the dynamic interval all-rows-quality, to consider all the rows.
In a further aspect of the invention implied conditions are identified for multiple functions. The bounds are calculated based on the implicit conditions, and are based on parameters contained in the query.