A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
A Computer Program Listing Appendix, containing one (1) total file on compact disc, is included with this application.
1. Field of the Invention
The present invention relates generally to data processing environments and, more particularly, to system and methods for improved optimization and execution of queries involving functional expressions against database columns having enumerated storage.
2. Description of the Background Art
Computers are very powerful tools for storing and providing access to vast amounts of information. Computer databases are a common mechanism for storing information on computer systems while providing easy access to users. A typical database is an organized collection of related information stored as xe2x80x9crecordsxe2x80x9d having xe2x80x9cfieldsxe2x80x9d of information. As an example, a database of employees may have a record for each employee where each record contains fields designating specifics about the employee, such as name, home address, salary, and the like.
One purpose of a database system is to answer decision support queries and support transactions. A query may be defined as a logical expression over the data and the data relationships set forth in the database, and results in identification of a subset of the database. For example, a typical query might be a request, in SQL syntax, for data values corresponding to all customers having account balances above required limits. During query processing, a database system typically utilizes one or more indexes to answer queries. Indexes are organized structures associated with the data to speed up access to particular data values (i.e., answer values). Indexes are usually stored in the database and are accessible to a database administrator as well as end users. The basic operation of database systems, including the syntax of SQL (Structured Query Language), is well documented in the technical, trade, and patent literature; see, e.g., Date, C., An Introduction to Database Systems, Volume I and II, Addison Wesley, 1990; the disclosure of which is hereby incorporated by reference.
xe2x80x9cData warehousexe2x80x9d systems represent a type of database system optimized for supporting management decision making by tracking and processing large amounts of aggregate database informationxe2x80x94that is, the data warehouse. Data warehouses contain a wide variety of data that present a coherent picture of business conditions at a single point in time. Development of a data warehouse includes development of systems to extract data from operating systems plus installation of a warehouse database system that provides managers flexible access to the data. A well-known example of a data warehouse system is Sybase(copyright) Adaptive Server(copyright) IQ (ASIQ), available from Sybase, Inc. of Dublin, Calif.
A data warehousing system typically handles large pools of historical information representing different portions of a business. These different portions may, for example, represent a number of different geographical areas, such as information relating to sales to customers located in Texas, Georgia, Massachusetts, New York, and Maine. The information may also represent different periods of time, such as sales in particular calendar months or quarters. A number of different products and/or business units may also be involved. Database systems frequently store very large quantities of information and for this reason database administrators continuously look for ways to more efficiently store and retrieve information in database systems.
One well-known technique for more efficiently storing information in database systems is utilizing enumerated storage. Enumerated storage utilizes a dictionary look-up style scheme in which the dictionary (or look-up table) contains a list of distinct values, with each distinct value associated with a particular offset which serves as an address for locating that value. Instead of redundantly storing the same information, such as the name of a particular state (e.g., New York) multiple times in a column of the database table, the look-up table contains values that will be used as the contents of the column. The column in the main database table may then store only the offset. However, since there is a relationship between the look-up table and the main table, the value associated with offset in the look-up table is also associated with the record in the main table. For example, a database column T.STATE_NAME with enumerated storage may have a look-up table such as the following:
In a situation where the number of distinct values in a particular column of a database is modest (e.g., 50 states) compared to the number of rows in the table, this enumerated storage scheme is considerably more space efficient than repeatedly storing the same raw data values in the database column, and thereby requires fewer input/output (I/O) operations to read or write.
One problem in handling many database queries is the evaluation of functional expression on columns. Considerable effort is spent in database query engines to identify the point in a query plan where a functional expression should be evaluated in order to minimize the estimated number of evaluations of the expression. If a predicate (or condition) includes complex or simple expressions over column(s) that cannot be answered from an index, then formulating an answer to the predicate involves the expensive (in terms of system performance) task of scanning all of the raw data values of the column, row by row, in order to respond to the query. For instance, a column may contain state names. A user could use an expression such as the following to obtain all state names beginning with the letter xe2x80x9cSxe2x80x9d:
SELECT T.CUSTOMER FROM T
WHERE T.STATE_NAME LIKE xe2x80x98S%xe2x80x99
Typically, a traditional B-Tree index would enable a user or administrator to easily select all states beginning with a particular letter, because the first letter is the leading portion of the string. However, if a user wanted to find the second letter of a state name, this could not be handled using the index as this string is in the middle of the data. As a result there is no traversal path in the index that will lead to the answer. Thus, a table scan (or column scan) would typically be required to handle this type of query expression in current database systems.
Consider, for example, the following Structured Query Language (SQL) statement:
SELECT T.CUSTOMER FROM T
WHERE SUBSTRING(T.STATE_NAME, 2, 1)=xe2x80x98exe2x80x99
This type of expression could not be resolved from a typical B-Tree index. It also is problematic to resolve against a column with enumerated storage in current database systems, as it requires a column scan of the raw data in order to evaluate the query. If one has a large column with billions of rows, this is an expensive task to undertake, as it could easily require the evaluation of the expression billions of times.
If a database administrator knows in advance that a particular query using a particular functional expression will be used, one possible way to address this problem is by using the concept of xe2x80x9cmaterialized views.xe2x80x9d This approach involves executing a particular query in advance and storing the results in an index or data structure. Materialized views may be useful when an administrator knows in advance that a particular query expression containing specific functional expressions will be used. When a query optimizer subsequently receives that particular query (e.g., a specific query expression containing the exact same set of substitution arguments), the optimizer accesses the pre-computed results to generate a response. However, this materialized view approach is only useful for a pre-defined query, where a specific query expression is known in advance and is not varied. This may be useful for things like a standard monthly or quarterly report. However, it is not useful for data mining or ad-hoc analysis. Data mining or ad-hoc analysis involves following hunches, taking different views of information, and exploring business data differently in order to gain a better understanding about a business. A data miner often wants to look at the same information several different ways to see what happens when certain variables are changed. The pre-planned, canned approach of materialized views is not useful for this type of ad-hoc analysis. In addition, the materialized views approach requires additional resources to pre-compute and store the result of the pre-planned query, and to maintain it as the contents of the tables involved change.
An improved method for handling database queries including functional expressions against columns having enumerated storage is required. This improved method should be useful for ad-hoc queries and not just for pre-defined queries where a specific query expression is known in advance and is not varied. The method should also enable more efficient calculation and projection of results of a database query, thereby accelerating processing of the query and return of results. The present invention satisfies these and other needs.
The following definitions are offered for purposes of illustration, not limitation, in order to assist with understanding the discussion that follows.
Functional Expression: In the context of relational databases, a functional expression is an expression whose domains fall into one of the domains a user can specify when creating a column in a table, and whose result depends only on the arguments it receives. The American National Standards Institute (ANSI) SQL-92 standard defines many such functional expressions that are grouped into two categories: value functions, including CHARACTER_LENGTH, POSITION, TRIM, SUBSTRING, LOWER, UPPER, and CAST, and value expressions, including +, xe2x88x92, *, and ∥ (the string concatenation operator).
Look-up table: A look-up table is an internal data structure within a database that contains information regarding the values in an associated column with enumerated storage.
Predicate: In the context of relational databases a predicate is a truth-valued function, possibly negated, whose domain is a set of attributes and/or scalar values and whose range is {true, false, unknown}.
SQL: SQL stands for Structured Query Language, which has become the standard for relational database access, see e.g., Melton, J. (ed.), American National Standard ANSI/ISO/IEC 9075-2: 1999, Information Systemsxe2x80x94Database Languagexe2x80x94SQL Part2: Foundation, the disclosure of which is hereby incorporated by reference. For additional information regarding SQL in database systems, see e.g., Date, C., An Introduction to Database Systems, Volume I and II, Addison Wesley, 1990, the disclosure of which is hereby incorporated by reference.
The present invention provides an improved method for accelerating execution of database queries including a predicate containing functional expressions against columns having enumerated storage. The method commences with the receipt of a query including a predicate having at least one functional expression referencing at least one database column containing offsets to values in enumerated storage. Upon receipt of a query including this type of predicate, a look-up table is created for storing results of evaluation of the predicate against the values in enumerated storage. Each functional expression of the predicate is evaluated against the values in enumerated storage and the results of evaluation are stored in the look-up table. Theses results of evaluation stored in the look-up table may then be accessed through use of the offsets to values in enumerated storage.
The improved method of the present invention may also be utilized for projecting expressions against database columns having enumerated storage. In this situation, the method commences when a query containing an expression requiring projection of values from at least one database column containing offsets to values in enumerated storage is received. A look-up table is created to contain the results of evaluation of at least one expression of the query against the values in enumerated storage. One or more query expressions are then evaluated against the values in enumerated storage and the results of evaluation are stored in the look-up table. The result of evaluation may then be retrieved from the look-up table using the offsets to values in enumerated storage.