A data base has been define as a collection of data that can be concurrently shared and used by multiple applications. Data bases have evolved from simple file systems to massive collections of data serving a community of users and numerous distinct applications.
The data disposed within a data base can be organized as a plurality of records. Each record typically includes data values for one or more common categories of information. For example, each of a plurality of records may include information (i.e. data values) for the following categories: a person's name, address, age, gender, telephone number, account numbers and credit limits.
One important data base tool is the data base management system (DBMS). A DBMS is a data processor which aids in the storage, manipulation, reporting, management and control of the data base. Since the 1970's, DBMSs have become widely used and are becoming the main technology for general purpose data base management.
One purpose of a DBMS is to answer decision support queries and support transactions. A query may be defined as a logical expression over the data and the data relationships set forth in the data base, and results in identification of a subset of the data base. For example, a typical query for the above-noted data base might be a request for data values corresponding to all customers having account balances above required limits. A transaction includes several query and altering operations over data and is used to define application events or operations.
A DBMS typically utilizes one or more indexes to answer queries. Indexes are organized structures, created by a data base administrator, associated with the data to speed up access to particular data values (i.e. answer values). Indexes are usually stored in the data base and are accessible to a data base administrator as well as end users.
One indexing approach is based on a structure known as the B-tree. A B-tree index is a multi-level, tree-structured index in which all leaf entries (i.e. data values) in the structure are equidistant from the root of the tree. As a result, the B-tree index provides uniform and predictable performance for retrieval operations. A B-tree index includes a root page, zero or more intermediate pages and a set of leaf pages. The leaf level includes an entry for each unique value of the indexed data, providing the indexed value and an indication (typically a row identifier) for each data base record that contains the value. Each level above the leaf level contains an index entry for every page of the level below. Thus, the B-tree structure provides relatively fast, direct access to the leaf pages and hence, the indexed data.
Another indexing approach, which is a refinement of B-tree indexing, is known as the keyword indexing. In this approach, a modified B-tree is formed for the unique values of a group of data values. More specifically, the B-tree has only the unique values at the leaf level with a bit map associated with each unique value. For example, consider a block of data having fifty thousand records (i.e. distinct rows of data) which indicate, among other things, gender for each of fifty thousand people. In this situation, there are three unique values: male, female and undefined. Thus, three bit maps would be generated, one each for male, female and undefined. Each bit map would have fifty thousand bits, with ONE bits at locations corresponding to those people with that gender and ZERO bits at locations corresponding to those people having another gender.
Existing indexing approaches, such as those described above, are not without problems. For example, the B-tree indexing approach typically requires a substantial period of time for creating the indexes. Once created, existing B-tree indexes occupy a large portion (e.g. 250% of the space allotted for the data) of memory. Additionally, B-tree indexes are not always fast enough for decision support queries on large-scale DBMSs.