In the latter half of the twentieth century, there began a phenomenon known as the information revolution. While the information revolution is a historical development broader in scope than any one event or machine, no single device has come to represent the information revolution more than the digital electronic computer. The development of computer systems has surely been a revolution. Each year, computer systems grow faster, store more data, and provide more applications to their users.
A modern computer system typically comprises hardware in the form of one or more central processing units (CPU) for processing instructions, memory for storing instructions and other data, and other supporting hardware necessary to transfer information, communicate with the external world, and so forth. From the standpoint of the computer's hardware, most systems operate in fundamentally the same manner. Processors are capable of performing a limited set of very simple operations, such as arithmetic, logical comparisons, and movement of data from one location to another. But each operation is performed very quickly. Programs which direct a computer to perform massive numbers of these simple operations give the illusion that the computer is doing something sophisticated. What is perceived by the user as a new or improved capability of a computer system is made possible by performing essentially the same set of very simple operations, but doing it much faster. Therefore continuing improvements to computer systems require that these systems be made ever faster.
The overall speed at which a computer system performs day-to-day tasks (also called “throughput”) can be increased by making various improvements to the computer's hardware design, which in one way or another increase the average number of simple operations performed per unit of time. The overall speed of the system can also be increased by making algorithmic improvements to the system design, and particularly, to the design of software executing on the system. Unlike most hardware improvements, many algorithmic improvements to software increase the throughput not by increasing the average number of operations executed per unit time, but by reducing the total number of operations which must be executed to perform a given task.
Complex systems may be used to support a variety of applications, but one common use is the maintenance of large databases, from which information may be obtained. Large databases usually support some form of database query for obtaining information which is extracted from selected database fields and records. Such queries can consume significant system resources, particularly processor resources, and the speed at which queries are performed can have a substantial influence on the overall system throughput.
Conceptually, a database may be viewed as one or more tables of information, each table having a large number of entries (analogous to row of a table), each entry having multiple respective data fields (analogous to columns of the table). The function of a database query is to find all rows, for which the data in the columns of the row matches some set of parameters defined by the query. A query may be as simple as matching a single column field to a specified value, but is often far more complex, involving multiple field values and logical conditions.
To support queries, a database typically includes one or more indexes for some of the database fields. An index is a sorting of the records in one of the database tables according to the value of a corresponding field. For example, if the database table contains records about people, one of the fields may contain a birthdate, and a corresponding index contains a sorting of the records by birthdate. If a query requests the records of all persons born before a particular date, the sorted index is used to find the responsive records, without the need to examine each and every record to determine whether there is a match. A well-designed database typically contains a respective index for each field having an ordered value which is likely to be used in queries.
Execution of a query involves retrieving and examining records in the database according to some search strategy. For any given logical query, not all search strategies are equal. Various factors may affect the choice of optimum search strategy. In particular, where a logical AND of multiple conditions is specified, the sequential order in which the conditions are evaluated can make a significant difference in the time required to execute the query. The reason for this difference is that the first evaluated condition is evaluated with respect to all the records in a database table, but a later evaluated condition need only be evaluated with respect to the subset of records for which the first condition was true. Similarly, for a query involving a multiple conditions conjoined by a logical OR, a later evaluated condition need only be evaluated with respect to the subset of records for which an earlier condition was false.
To support database queries, large databases typically include a query engine which executes the queries according to some automatically selected search strategy, using the known characteristics of the database and other factors. Some large database applications further have query optimizers which construct search strategies, and save the query and its corresponding search strategy for reuse.
A query engine or optimizer can use the indexes and other known characteristics of the database provided by the database designer. However, many large databases further support the use of user-defined functions in the database queries. As used herein, a user-defined function includes any of various functions or procedures which may be embedded in a query to provide additional capability beyond mere comparison of values from database fields. One form of user-defined function is computer programming code, written by the user in any of various general programming languages, which returns a value in response to one or more input parameters from a database record. Such a user-defined function may accept one or more input parameters (often database field values), which are passed with the call to the function. Although the external interface is via the passed parameter(s) and the returned value(s), a user defined function behaves internally as any other computer programming code. It may have arbitrary complexity, including loops, branches, calls to other functions and procedures, etc. Another form of user-defined function is a stored procedure, which is a special procedure built through Structured Query Language (SQL), a well-known set of database query syntax supported by many database applications. A user defined function might also be a sub-query, i.e., a query (set of query conditions) within a query, which is usually re-used for different queries, and provides the capability to define queries of added complexity. A user-defined function may have arbitrary complexity. The database designer does not know in advance what the user-defined functions will be, and since these are under control of the user, it is difficult or impossible to construct indexes for user-defined function values.
Because most user-defined functions involve complex evaluations which take longer to execute than straightforward matching of query parameters to database field values, conventional query engines and optimizers typically evaluate user-defined functions last. I.e., any conditions other than user-defined functions are evaluated first in order to reduce the subset of records which must be evaluated using the user-defined function.
If a query contains multiple user-defined functions, a conventional engine or optimizer will typically arbitrarily choose one of the user-defined functions to evaluate first. E.g., the engine or optimizer may evaluate the first function specified in the search string. Unless the user understands the workings of the database sufficiently well to correctly specify the search order of user-defined functions (which relatively few users do), such an arbitrary choice is likely to be sub-optimal. If intelligent choices could be made when selecting the order of evaluation of multiple user-defined functions, the execution performance of such queries could be improved. A need therefore exists, not necessarily recognized, for an improved database query engine or optimizer which can automatically make intelligent choices in ordering the evaluation of user-defined functions.