The present invention relates generally to retrieval of records stored in a database, and in particular to an index that facilitates efficient access to specific records.
A relational database is a digital database whose organization is based on the relational model of data. This model organizes data into one or more tables, or relations, of rows and columns, with a unique key for each row. Generally, each entity type described in a database has its own table, the rows representing instances of that type of entity, or objects, and the columns representing values, or properties, attributed to that instance. Because each row in a table has its own unique key, rows in a table can be linked to rows in other tables by storing the unique key of the row to which it should be linked. Data relationships of arbitrary complexity can be represented using this set of concepts. The various software systems used to maintain relational databases are known as Relational Database Management Systems (RDBMS). Virtually all relational database systems use SQL (Structured Query Language) as the language for querying and maintaining the database.
A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data in a database table without having to perform a full table scan, which searches every row in the table. Indexes associated with one or more columns of a database table can provide a basis for both rapid random lookups and efficient access of ordered rows.
A Bloom filter is a memory-efficient, probabilistic data structure that supports approximate membership queries in a set. When testing whether an object is a member of a set represented by a Bloom filter, a query may return “definitely not in set” or “may be in set,” with a small probability of false positives. A Bloom filter is typically implemented as a bit vector, or array, into which a set of values, representing set elements, is hashed. In general, a Bloom filter may be considered when space is at a premium and the effect of false positives can be mitigated. Due to their efficiency, compact representation, and flexibility in allowing a trade-off between memory requirements and false positive probability, Bloom filters are popular in representing diverse sets of data. For example, they are used in databases, distributed systems, web caching, and other network applications, where systems need to share information about what resources they have. A typical example is using a Bloom filter to reduce expensive disk or network lookups for non-existent objects. If the Bloom filter indicates that the object is not present, then an expensive lookup may be avoided; otherwise, a lookup may be performed, but it may fail a certain percentage of the time.
A Bloom filter index organizes a collection of Bloom filters. Searching a Bloom filter index for a target Bloom filter typically involves comparing indexed Bloom filters with the target Bloom filter to find matches. A standard, but inefficient, technique for locating a target Bloom filter in a collection of Bloom filters is to linearly search a list of all the Bloom filters in the collection for ones that match the target.