1. Field of the Invention
The present invention generally relates to query execution management and, more particularly, to managing execution of queries including user-defined functions.
2. Description of the Related Art
Databases are computerized information storage and retrieval systems which can be organized in multiple different ways. An overall database organization is typically referred to as a schema for the database, such as a hierarchical or relational schema. A database schema is often compactly expressed using table names and names of columns in tables. Database schemas frequently take the form of a “star”, where there is one large “mother” table and many small “detail” tables. For instance, a simple database schema including a large mother table “Name/Address” and a small detail table “City” could be expressed as:                Name/Address (LastName, FirstName, M.I., PostalCode, . . . ) City (CityName, PostalCode)        
By way of example, the Name/Address table stores names, addresses and additional information for individuals in a multiplicity of columns “LastName”, “FirstName” etc. Each row of the Name/Address table is associated with a specific individual. The City table stores city names and postal codes in two columns “CityName” and “PostalCode”, respectively. Each row of the City table links a particular city name to a specific postal code. The PostalCode columns in the City and Name/Address tables are configured to join both tables.
Regardless of the particular database schema, a requesting entity (e.g., an application, the operating system or a user) demands access to a specified database by issuing a database access request. Such requests may include, for instance, simple catalog lookup requests or transactions and combinations of transactions that operate to read, change and add specified records in the database. These requests are made using high-level query languages such as the Structured Query Language (SQL) in the case of a relational database. Illustratively, SQL is used to make interactive queries for getting information from and updating a database such as International Business Machines' (IBM) DB2, Microsoft's SQL Server, and database products from Oracle, Sybase, and Computer Associates. The term “query” denominates a set of commands for retrieving data from a stored database.
Queries take the form of a command language that lets programmers and programs select, insert, update, find out the location of data in a database, and so forth. For instance, SQL supports four types of query operations, i.e., SELECT, INSERT, UPDATE and DELETE. A SELECT operation retrieves data from a database, an INSERT operation adds new data to a database, an UPDATE operation modifies data in a database and a DELETE operation removes data from a database. Furthermore, customized operations can be created as extensions to SQL using user-defined functions (UDFs). A UDF is a mechanism which allows users to create user or application-specific operations for use with SQL. By way of example, UDFs can be created to perform calculations, both simple and complex. However, the complexity of UDFs may greatly differ from case to case. For instance, in some simple cases UDFs can be provided to convert to uppercase and trim a character field. In more complex cases, UDFs can be provided to convert to uppercase and trim a field based on the field's value and subsequently concatenate the field with something else. In still more complex examples, UDFs may convert values using complex mathematical expressions.
Queries typically involve data selections based on attributes of detail tables followed by retrieval of information for searched data records, i.e., rows from a corresponding mother table. By way of example, using the City and Name/Address tables, a query can be issued to determine all individuals living in a particular city. To this end, all rows in the City table would be scanned to find those rows having the corresponding city name in the CityName column, and then the postal codes in those rows would be retrieved from the PostalCode column. Subsequently, the Name/Address table would be scanned to locate all rows having one of the retrieved postal codes in the PostalCode column. The located rows contain the searched information related to the individuals residing in the particular city.
Queries and, consequently, query workload can consume significant system resources, particularly processor resources. The system resources consumption of a query against one or more databases depends on the complexity of the query and the searched database(s). A typical way of reducing system resource consumption for looking up information in tables of a database consists in using indexes. One type of index is a bitmap index, which indicates whether a specific value exists for each row in a particular column. One bit represents each row. Accordingly, there may be an index into the Name/Address table identifying all rows in the PostalCode column that have a particular postal code value. For instance, in the bitmap index for the PostalCode column, the nth bit equals 1 if the nth row of the Name/Address table contains a value of “45246”, or 0 if that row holds a value other than “45246”. Typically there are multiple bitmap indexes for each column, one for each of several values that may appear in the column (e.g., one index for the value “45246”, another index for the value “45202”, and so on). Another type of index is an encoded vector index (EVI), disclosed, for example, in U.S. Pat. No. 5,706,495, issued Jan. 6, 1998 to Chadha et al., entitled ENCODED-VECTOR INDICES FOR DECISION SUPPORT AND WAREHOUSING, which is incorporated herein by reference. An EVI serves a similar purpose as a bitmap index, but only one index is necessary to account for all values occurring in the column (whether they are “45246”, “45202”, or any other). Accordingly, in an EVI on the PostalCode column, the nth position of the EVI contains a bit code that can be decoded using a lookup table to produce the value “45246”, which is the postal code in the nth row of the table. Thus, whereas a separate bitmap index is required to map each particular key value in a database field, only one EVI is required to represent the same information. Therefore, an EVI saves computer memory by including all possible key values for a given field in one database index.
However, one shortcoming of the prior art is that indexes can generally not be used if a query condition of a corresponding query includes a UDF. More specifically, assume a SQL query having a WHERE clause, wherein the query condition is a predicate of the WHERE clause. Assume further that the query condition includes a UDF having an output which varies in response to the UDF's input. Because the required index in the case of corresponding bitmap indexes or the respective bit code in the case of a corresponding EVI may change with each new input value of the UDF such indexes are typically not allowed.
Therefore, there is a need for an effective query execution management in a data processing system for efficiently managing execution of queries having query conditions which include user-defined functions.