1. Field of the Invention
The present invention relates generally to database management systems and more particularly to the use of bitmap values in a database management system to represent subsets of a set of objects.
2. Description of Related Art
The following Description of related art will begin with a description of the built-in indexing systems provided by modern database management systems and will then describe how modern database management systems further permit users to define their own kinds of indexing systems for use by the database management system. It finally describes limitations of these techniques that are solved by the persistent, user-accessible bitmap values described in U.S. Ser. No. 10/810,756.
Indexes Generally
Any large collection of information is made much more useful by the inclusion of an index. For example, if a history of the American Revolutionary War has an index and a reader of the history is interested in General Henry Knox's role in forcing the British to evacuate Boston on Mar. 17, 1776, the reader need only look up “Knox, Henry” in the index, where the reader will find a list of the pages in the history book on which General Knox is mentioned. Without the index, the reader would have to scan large portions of the book to find what he or she was looking for. In the terms used in the following discussion, the history of the American Revolution defines a set of information; often a reader of the history is interested only in a portion of that set of information; the portion is termed a subset of the information. Thus, what the history's index does is specify the locations of subsets of the information and thereby speed up the user's access to the indexed subsets.
Built-In Indexing Systems
Database management systems exist to manage and provide access to large collections of information, and as one would expect, database management systems have indexes. In relational database management systems, the collections of information are organized as tables. A database table has a number of columns and may have a number of rows. Each row has a field corresponding to each column. The value of the field corresponding to a column has the kind of value required for the column. For example, the following simple table Employees has four columns and two rows:
RowidNameGenderJob_Title1SmithFManager2JonesMLaborer
Each row represents an employee. The column Rowid is a built-in column whose fields contain a database-system assigned row identifier for each row in the table. The row identifier uniquely identifies the row in the database management system. The fields corresponding to the column Name contains the name of the employee represented by the row. The fields corresponding to Gender contain the employee's biological gender; the fields corresponding to Job_Title, finally, contain the employee's job title. To obtain information from the table, the user provides a query to the database management system which describes the rows from which information is to be fetched and the information to be fetched from those rows. For example, the query                SELECT Name FROM Employees WHERE Job_Title=Managerselects the row from Employees whose job title field has the value Manager, that is, the row whose rowid is 1, and returns the value of the Name field from that row, that is, Smith. In terms of the introductory discussion, the query specifies a subset of the set of information contained in Employees and returns the specified subset.        
While making an index for Employees is hardly worthwhile, one can be made. Indeed, if Employees were a 10,000 row table, making an index would definitely be worth while. For example, an index by the values of the Name column on Employees would look like this: Jones, 2; Smith, 1. The index has the names in alphabetical order and each name is followed by the rowid for the row the name occurs in. Modern database management systems such as the Oracle 9i® database management system produced by Oracle Corporation, Redwood City, Calif. contain a built-in indexing facility which permits users to specify indexes like the name index in the above example. A specification for such an index in the database management system looks like this:                CREATE INDEX employee_name_index ON Employees(Name)In response to the specification, the database management system creates the index, updates the index when the table to which it belongs changes, and uses the index to speed up queries on the table. For example, given the above index and the query        SELECT Job Title FROM Employees WHERE Name=Smiththe database management system would use the index to determine that the row for Smith was row 1, instead of reading down the table until it found the row whose Name field had the value Smith. The use of the index in the query is thus exactly analogous to its use by the human reader of the history book.        
One of the kinds of indexes produced by modern database management systems is a bitmap index. A bitmap is a sequence of bits which has been mapped to a set of objects. Each bit corresponds to one of the objects in the set. A bitmap value is a bitmap in which bits have been set to specify a subset of the set of objects. When an object belongs to the subset, the bit corresponding to the object in the bitmap value is set. In the bitmap indexes used in modern database management systems, the bitmap has been mapped to the set of rowids representing the rows in a table. For example, there are two rows in the table Employees, so the set of rowids has two members and the bitmap has two bits. In our example, the first bit of the bitmap is mapped to rowid 1 and the second to rowid 2. Because each bit in the bitmap is mapped to a rowid in the table, bitmap values can be used as indexes into the table. For instance, a bitmap value that represents a set of rowids in the table Employees can be used to indicate all rows in the table that have the value M in the Gender field. In such a bitmap value, the bit representing a given rowid of the table has the value 1 when M is present in the row's Gender field and otherwise 0. The example bitmap value for the value M in the Gender field in Employees is 0,1. The value M is termed the key of the bitmap value. To locate the row with the value M, the database management system consults the bitmap value for that key and determines from the fact that the 1 is the second bit in the bitmap value that the row with the value is the row having rowid 2.
The Oracle 9i database management system permits the user to specify that a bitmap index be created for a column of a table. The database management system responds to such a specification by making a bitmap index that includes a bitmap value for each possible value of the fields of the column. For example, fields in the Gender column may have only two values: M and F. Thus, for this column, the database management system would build two bitmap values, one for the key M and the other for the key F. A specification that would cause the Oracle 9i database management system to build such and index is the following:                CREATE BITMAP INDEX Gender_index ON Employees(Gender)The bitmap value for the M key is 0,1, as set forth above; the bitmap value for the F key is 1,0. The bitmap value for the M key thus specifies the subset of the rows of the table Employees that contains the row having rowid 2, while the bitmap value for the F key specifies the subset that contains the row having rowid 1.        
The advantages of the bitmap index are that it takes up very little space and that logical operations such as AND, OR, and NOT in queries can be performed very rapidly on fields with bitmap indexes by performing the logical operations on the bitmap values. For example, the query                SELECT Name FROM Employees WHERE Gender=‘M’ OR Gender=‘F’would OR the bitmap value for Gender whose key is M, namely 1,0, with the bitmap value for Gender whose key is F, namely 0,1, to produce the bitmap value 1,1, which specifies that every row in Employees is to be selected. The built-in indexing systems provided by the Oracle 9i database management system are described in detail beginning at the section Indexes on page 10-28 of Oracle 9i Database Concepts, Release 2, published by Oracle Corporation in 2002 and available from Oracle Corporation as part number A96524-01. The description is incorporated into the present patent application by reference.User-Defined Indexing Systems        
In the database management systems for which the built-in indexing systems were designed, the values contained in the fields of the database management system's tables had to belong to one of a small number of built-in data types. The built-in data types typically included character data types for names and words, decimal data types for decimal numbers, integer data types for whole numbers, and data types of system values used in the database management system's metadata, that is, the data which defines the tables. In the example table Employees, the rowids are such system data. The data in the other fields has character data types. More recently, database management systems have included arrangements which permit the user to define his or her own data types and use values having those data types in fields in the database management system's tables. The user-defined data types are employed in a domain in which the user is interested. For example, a user interested in photographs might define data types suitable for that domain. An example would be a data type Photograph. At a minimum, the definition of Photograph specifies how values of type Photograph are represented in the database management system. The definition can also specify operations that may be performed on values of the type. For instance, if the domain requires that photographs be compared, the definition may specify a Like operation that compares two photographs and returns a result that is a measure of similarity. Finally, the definition can specify an indexing system for values of type Photograph. To do this, the user must specify how the index is defined, how it is maintained, and how it is read. The mechanisms used for defining user-defined types and operations on them in the Oracle 9i database management system are described in Oracle 9i Data Cartridge Developer's Guide, Release 2 (9.2), Part No. A96595-01, available from Oracle Corporation. The discussion of user-defined indexing is contained in Chapter 7, Building Domain Indexes. The entire reference is hereby incorporated into the present application by reference.
Limitations of Built-In and User-Defined Indexing Systems
The built-in indexing systems are easy to use and efficient, but the number of kinds of indexes is limited. For example, the Oracle 9i database management system currently provides the following built-in indexing schemes:                B-tree indexes        B-tree cluster indexes        Hash cluster indexes        Reverse key indexes        Bitmap indexes        Bitmap Join Indexes        
Further, there are certain kinds of built-in data types that cannot be indexed using these indexing schemes and the built-in indexing schemes have only limited use with user-defined types. A final disadvantage of the built-in indexing schemes is that the user has very little control over the manner in which the database management system constructs the indexes and no access to the internal indexing data.
For example, with bitmap indexes, the system always constructs a bitmap for every possible value of the field whose values are being used as keys. The system further requires that the values used as keys are mutually exclusive and that values in the column for which the index is being made belong to system-defined types. There are, however, many situations in which the user is interested in indexes only for certain key values or indexes for overlapping ranges of values or in indexes for user-defined types and the built-in bitmap indexes are simply not useful in those situations. Moreover, the primitive operations that the database management system employs to make and manipulate bitmap values are never accessible to the user. For example, the user cannot make a bitmap value that represents the rowids returned by a user-defined query.
A user can of course make a user-defined indexing system that uses any indexing scheme that is useful for the domain of the data being indexed. The drawback of such a user-defined indexing scheme is that the user must make the considerable effort required to design and program the indexing scheme. One of the things that increases the effort required to design and program such index schemes is the unavailability of indexing primitives to the programmer.
As noted above, bitmap values are used in indexing schemes because they are compact representations of sets of objects. Use of bitmap values in other situations which require compact representations of sets of objects is not possible because programmers using the database management system have access neither to the bitmap values themselves nor to primitive operations for them.
The MySQL open source relational database system has a SET built-in data type which is used to represent sets of up to 64 user-defined objects. A value of the SET type represents a subset of a particular user-defined set of objects, and the subset is represented by a bitmap value which is mapped onto the particular user-defined set of objects and has a bit set for each object that belongs to the subset. MySQL provides primitive operations for values of the SET type, but the limitation of the number of objects in the set to 64 renders MySQL set types useless for applications which require bitmap values that are capable of representing subsets of large sets of objects. One large class of such applications is of course bitmap indexes.
U.S. Ser. No. 10/810,756 describes techniques which give programmers access to bitmap values specifying subsets of large sets of values and to primitive operations for the bitmap values. These techniques greatly increase the number of situations in which bitmap indexes can be employed, permit their use with objects having user-defined classes, and permit the use of bitmap values generally to represent subsets of large sets of objects. The expanded use of bitmap values which has resulted from the use of the techniques of U.S. Ser. No. 10/810,756 has however led to a need to efficiently locate table rows whose bitmap values indicate the presence or absence of an object in the set of objects represented by the bitmap values. It is an object of the invention disclosed herein to provide an indexing technique which permits efficient location of such table rows.