Databases are used to store information for an innumerable number of applications, including various commercial, industrial, technical, scientific and educational applications. As the reliance on information increases, both the volume of information stored in most databases, as well as the number of users wishing to access that information, likewise increases. Moreover, as the volume of information in a database, and the number of users wishing to access the database, increases, the amount of computing resources required to manage such a database increases as well.
Database management systems (DBMS's), which are the computer programs that are used to access the information stored in databases, therefore often require tremendous resources to handle the heavy workloads placed on such systems. As such, significant resources have been devoted to increasing the performance of database management systems with respect to processing searches, or queries, to databases.
Improvements to both computer hardware and software have improved the capacities of conventional database management systems. For example, in the hardware realm, increases in microprocessor performance, coupled with improved memory management systems, have improved the number of queries that a particular microprocessor can perform in a given unit of time. Furthermore, the use of multiple microprocessors and/or multiple networked computers has further increased the capacities of many database management systems.
From a software standpoint, the use of relational databases, which organize information into formally-defined tables consisting of rows and columns, and which are typically accessed using a standardized language such as Structured Query Language (SQL), has substantially improved processing efficiency, as well as substantially simplified the creation, organization, and extension of information within a database. Furthermore, significant development efforts have been directed toward query “optimization”, whereby the execution of particular searches, or queries, is optimized in an automated manner to minimize the amount of resources required to execute each query.
Through the incorporation of various hardware and software improvements, many high performance database management systems are able to handle hundreds or even thousands of queries each second, even on databases containing millions or billions of records. However, further increases in information volume and workload are inevitable, so continued advancements in database management systems are still required.
One area that has been a fertile area for academic and corporate research is that of improving the designs of the “query optimizers” utilized in many conventional database management systems. The primary task of a query optimizer is to choose the most efficient way to execute each database query, or request, passed to the database management system by a user. The output of an optimization process is typically referred to as an “execution plan,” “access plan,” or just “plan” and is frequently depicted as a tree graph. Such a plan typically incorporates (often in a proprietary form unique to each optimizer/DBMS) low-level information telling the database engine that ultimately handles a query precisely what steps to take (and in what order) to execute the query. Also typically associated with each generated plan is an optimizer's estimate of how long it will take to run the query using that plan.
An optimizer's job is often necessary and difficult because of the enormous number (i.e., “countably infinite” number) of possible query forms that can be generated in a database management system, e.g., due to factors such as the use of SQL queries with any number of relational tables made up of countless data columns of various types, the theoretically infinite number of methods of accessing the actual data records from each table referenced (e.g., using an index, a hash table, etc.), the possible combinations of those methods of access among all the tables referenced, etc. An optimizer is often permitted to rewrite a query (or portion of it) into any equivalent form, and since for any given query there are typically many equivalent forms, an optimizer has a countably infinite universe of extremely diverse possible solutions (plans) to consider. On the other hand, an optimizer is often required to use minimal system resources given the desirability for high throughput. As such, an optimizer often has only a limited amount of time to pare the search space of possible execution plans down to an optimal plan for a particular query.
Even with optimized plans and other improvements, query performance during execution can be adversely impacted by a number of factors. One of the greatest contributors to poor performance is input/output (I/O) operations to retrieve records from mass storage. Retrieving records from memory for further analysis as part of the query is a time consuming process and retrieving them from storage devices such as disk drives and other non-volatile memory has an even greater impact on performance.
One recent development in increasing performance of executing database queries has been the introduction of a bitmap. A bitmap consists of a vector of 1-bit elements in which each element corresponds to a row of a table. Bitmaps are particularly useful in searching table columns over which an index has been built for a particular selection criteria. In operation, the database engine, while executing a query, typically scans the index and then builds the bitmap by setting each bitmap element to either a “1” or a “0” depending on whether the value in the corresponding row of that column satisfies the selection criteria. A “1” typically indicates that the selection criteria is satisfied while a “0” indicates otherwise. Alternatively, the meaning associated with these values could be reversed without complicating the use of a bitmap. The bitmap operates the same, regardless of the specific values used, if one value is considered to be “active” and another state is considered to be “inactive”.
Thus, by using a bitmap, the database engine can not only forego testing records having “0” entries to see if they satisfy selection criteria, but can entirely avoid retrieving these records at all. As a result, I/O activity, one of the leading factors in slowing query performance, can be significantly reduced.
One drawback to the use of bitmaps, however, is that during execution of the query, records can not be retrieved and the result set returned until the bitmap is completely built. Thus, the use of bitmaps can introduce a start-up delay before a query begins returning records to a user.
For example, it has been found that for a table of around one million records, the building of a bitmap from an index can introduce a delay of around 10 seconds or more. Because of this delay period, once a user submits a query, no feedback is initially provided to the user that the query is returning results or even running properly. This situation causes a perception that the query is slow or causes confusion about whether the database engine is even running. Accordingly, there is an unmet need in the prior art for utilization of bitmaps during query execution in a way which avoids the start-up delay of building the bitmap, thereby providing search results to a user a soon as possible.