Databases are used to store information for an innumerable number of applications, including various commercial, industrial, technical, scientific and educational applications. Many databases are relational databases, which organize information into formally-defined tables consisting of rows (i.e., records) and columns (i.e., fields), and which are typically accessed using a standardized language such as Structured Query Language (SQL). Database management systems (DBMS's) are the computer programs that are used to access the information stored in the databases, and to process searches, or queries, against the databases.
In general, a database query references at least one database table in a database and includes at least one predicate. The predicates may include an expression that references a field of the database table, multiple key values in that field, and an operator (e.g., in, etc.). The query may also include a clause that indicates that a subset of an answer set (e.g., a result table) should be returned instead of the entire answer set. To execute the query, many DBMS's perform query optimization, in which multiple execution plans or access plans for satisfying the database query are examined by a query optimizer to determine the most efficient way to execute the query.
One type of optimization that may be utilized in an execution plan includes the building of a bit map. A bit map is usually built during running or execution of the query, and indicates whether a specific value exists for each row in a particular column. One bit represents each row. For example, in a bit map for the value “Byron” in the column “City” of a table, the nth bit of the bit map equals 1 if the nth row in the table contains “Byron” (e.g., Query: Select Name from Table where City=‘Byron’) or 0 if that row holds a value other than “Byron”. Because the bit map represents the physical sequence of the rows in the table, the bit map is typically utilized to sequentially scan the table and to skip all the rows in the table that do not contain the desired value, also referred to as skip sequential processing. As such, bit maps typically improve query performance.
A bit map is usually built from an index, which itself is built over the table. An index is usually built over one or more fields of the database table, such as field “City”, and in many cases include sufficient information about which particular records in a database table likely match key values in a predicate without having to retrieve and scan all of the individual records of the database table. Without an index, a DBMS assumes an even distribution in the table of the desired key values and performs a full table scan of the database table, blindly searching through every row in the database table until the target data is located. Depending upon where the data resides in the database table, such a table scan can be a lengthy and inefficient process.
One type of index that may be utilized to build a bit map is an encoded vector index (“EVI”). An EVI is a data structure that is made up of two primary components: a symbol table and a vector table. The symbol table contains the distinct key values in the rows of a table covered, as well as statistical information about each key. The statistical information typically includes a numeric byte code identifying the key, the first and last rows of the table where the key is found (i.e., the relative record number (RRN)), and the number of times the key appears in the table (i.e., count). The vector table contains a list of byte codes indicating which key is contained in each row, and as the byte codes are generally in the same ordinal position as the rows in the table, the vector table corresponds to the actual rows in the table.
The byte codes of the vector table of the EVI are often utilized to dynamically build a bit map for a given key value. In particular, a single bit map is often built over the entire vector table to account for all the values occurring in the column (whether they be “Byron”, “Rochester”, or any other). So in an EVI on the “City” column, the nth position of a bit map created using that EVI contains a bit that identifies whether the value of “City” in the nth row of the table is or is not the given key value.
Although bit maps typically improve query performance, many resources are required to build bit maps and to use the bit maps. As such, bit maps are generally not built when the database query includes a clause that indicates that a subset of an answer set that satisfies the database query is to be returned, such as a fetch first clause or an optimize clause. Bit maps are not built because when a subset is to be returned, the resources required to populate a full bit map from an index typically exceed the resources that would be used to randomly access the records of the table directly through the index.
In general, fetch first clauses, optimize clauses, and similar clauses (e.g., FIRSTIO) are commonly included in a database query when the rows to be returned will be displayed on a screen or window. As users typically want to get the first screen back quickly, at least a couple of rows of the answer set may be quickly displayed to a user by including such a clause in the query. Moreover, the user may be provided with results much sooner than if the user had to wait for every row to be returned. Without these clauses, the database engine will typically retrieve all the rows of the answer set.
Specifically, a fetch first clause (e.g., FETCH FIRST ROW ONLY, FETCH integer ROWS, etc.) usually sets a maximum integer of rows that will be retrieved, regardless of the quantity of rows in the answer set that satisfy the database query. The integer is usually a positive non-zero number. The database engine will cease processing the query once it has retrieved the first integer of rows, and an attempt to fetch beyond the maximum integer of rows is handled as if the data ended.
The optimize clause (e.g., OPTIMIZE FOR integer ROW, Optimize for N Rows, etc.) indicates that the query does not intend to retrieve more than integer of rows from the answer set. As such, the optimizer will optimize the query based on the specified integer of rows. The clause does not change the answer set or the order in which the rows are fetched. Any number of rows can be fetched, but performance can possibly degrade after the specified number of fetches.
As the reliance on information increases, both the volume of information stored in most databases, as well as the number of users wishing to access that information, likewise increases. Moreover, as the volume of information in a database, and the number of users wishing to access the database increases, the amount of computing resources required to manage such a database increases as well. Thus, new ways to build bit maps, which often require many system resources to build, are needed in order to continue to provide significant improvements in query performance; otherwise, database users will be hampered in their ability to maximize intelligent information retrieval.