1. Field of the Invention
The present invention generally relates to query execution management and, more particularly, to managing execution of queries against one or more databases in a data processing system.
2. Description of the Related Art
Databases are computerized information storage and retrieval systems which can be organized in multiple different ways. An overall database organization is typically referred to as a schema for the database, such as a hierarchical or relational schema. A database schema is often compactly expressed using table names and names of columns in tables. Database schemas frequently take the form of a “star”, where there is one large “mother” table and many small “detail” tables. For instance, a simple database schema including a large mother table “Name/Address” and a small detail table “City” could be expressed as:                Name/Address (LastName, FirstName, M.I., PostalCode, . . . ) City (CityName, PostalCode)By way of example, the Name/Address table stores names, addresses and additional information for individuals in a multiplicity of columns “LastName”, “FirstName” etc. Each row of the Name/Address table is associated with a specific individual. The City table stores city names and postal codes in two columns “CityName” and “PostalCode”, respectively. Each row of the City table links a particular city name to a specific postal code. The PostalCode columns in the City and Name/Address tables are configured to join both tables.        
Regardless of the particular database schema, a requesting entity (e.g., an application or the operating system) demands access to a specified database by issuing a database access request. Such requests may include, for instance, simple catalog lookup requests or transactions and combinations of transactions that operate to read, change and add specified records in the database. These requests are made using high-level query languages such as the Structured Query Language (SQL) in the case of a relational database. Illustratively, SQL is used to make interactive queries for getting information from and updating a database such as International Business Machines' (IBM) DB2, Microsoft's SQL Server, and database products from Oracle, Sybase, and Computer Associates. The term “query” denominates a set of commands for retrieving data from a stored database. Queries take the form of a command language that lets programmers and programs select, insert, update, find out the location of data in a database, and so forth.
Queries typically involve data selections based on attributes of detail tables followed by retrieval of information for searched data records, i.e., rows from a corresponding mother table. By way of example, using the City and Name/Address tables, a query can be issued to determine all individuals living in a particular city. To this end, all rows in the City table would be scanned to find those rows having the corresponding city name in the CityName column, and then the postal codes in those rows would be retrieved from the PostalCode column. Subsequently, the Name/Address table would be scanned to locate all rows having one of the retrieved postal codes in the PostalCode column. The located rows contain the searched information related to the individuals residing in the particular city.
Queries and, consequently, query workload can consume significant system resources, particularly processor resources. The system resources consumption of a query against one or more databases depends on the complexity of the query and the searched database(s). A typical way of reducing system resource consumption for looking up information in tables of a database consists in using indexes. One type of index is a bitmap index, which indicates whether a specific value exists for each row in a particular column. One bit represents each row. Accordingly, there may be an index into the Name/Address table identifying all rows in the PostalCode column that have a particular postal code value. For instance, in the bitmap index for the PostalCode column, the nth bit equals 1 if the nth row of the Name/Address table contains a value of “45246”, or 0 if that row holds a value other than “45246”. Typically there are multiple bitmap indexes for each column, one for each of several values that may appear in the column (e.g., one index for the value “45246”, another index for the value “45202”, and so on). Another type of index is an encoded vector index (EVI), disclosed, for example, in U.S. Pat. No. 5,706,495, issued Jan. 6, 1998 to Chadha et al., entitled ENCODED-VECTOR INDICES FOR DECISION SUPPORT AND WAREHOUSING, which is incorporated herein by reference. An EVI serves a similar purpose as a bitmap index, but only one index is necessary to account for all values occurring in the column (whether they are “45246”, “45202”, or any other). Accordingly, in an EVI on the PostalCode column, the nth position of the EVI contains a bit code that can be decoded using a lookup table to produce the value “45246”, which is the postal code in the nth row of the table. Thus, whereas a separate bitmap index is required to map each particular key value in a database field, only one EVI is required to represent the same information. Therefore, an EVI saves computer memory by including all possible key values for a given field in one database index.
One difficulty when dealing with indexes is that both a bitmap index and an EVI index only information relating to a single column of the table. Thus, these indexes do not reflect relations between values in multiple columns. Such relations can be addressed by combining different indexes associated with different columns. For instance, assume a query which is to be executed on a table “ILLNESSES” having a multiplicity of columns including an “illness”, “patient_age” and “patient_city” column. Using this query, a researcher may try to retrieve all rows from the ILLNESSES table that contain information relating to individuals having the illness “diabetes” and which are younger than “35” years or living in “Rochester”. Assume further that the ILLNESSES table has 200 million rows and that each of the columns “illness”, “patient-age” and “patient_city” has less than 25 distinct key values. Accordingly, for each column an EVI having an index size of approximately 200 MB can be provided. While processing of the query using the EVIs significantly decreases system resources consumption compared to a full scanning of the 200 million rows of the ILLNESSES table, processing speed and time remain a critical processing factor because of the large size of the EVIs.
Current database technology allows for deriving dynamic bitmap indexes from EVIs for improving processing speed and time when processing a query. Using dynamic bitmap indexing for rapid data retrieval, bitmaps can be built on the fly from existing EVIs to identify exactly which rows in a table a user searches. Thus, a dynamic bitmap index is a temporary data structure that is built from a permanent index, such as an EVI. The dynamic bitmap index contains one bit for each row in the table for a specified selection criterion. If the index entry matches the selection criterion, the bit is set. The system can thus scan the dynamic bitmap index and retrieve directly only those records from the table where the bit is set. In the above example, a dynamic bitmap index is created for each column of the ILLNESSES table. For instance, a dynamic bitmap index is generated according to the above described query using an EVI associated with the “patient_age” column and includes a bit set to “1” for each row of the ILLNESSES table that relates to an individual who is younger than 35 years. The generated dynamic bitmap indexes for all columns can be combined according to Boolean AND and/or OR operators to identify the requested rows in the ILLNESSES table. The Boolean operators correspond to the Boolean AND and/or OR operators which logically combine the columns “illness”, “patient_age” and “patient_city” in the above described query (e.g., patient_age<35 OR patient_city=Rochester). As a result, a single final dynamic bitmap index can be obtained which identifies all relevant rows in the ILLNESSES table, thereby avoiding a complete scan of all rows in the table.
One shortcoming of the prior art is the manner in which execution of essentially similar queries is managed. For example, in a data mining environment queries are frequently executed that differ from each other only with respect to one or more of their respective query conditions. For instance, the above query may differ from another query only in that individuals younger than 35 years are searched instead of 55 year old individuals. When the other query is executed, the above described processing needs to be performed for the other query to determine another final dynamic bitmap index. As the EVIs associated with the columns of the ILLNESSES table are rather large, significant processing resources and time would, thus, also be required for processing the other query.
Therefore, there is a need for an effective query execution management in a data processing system for efficiently managing execution of similar queries.