This invention relates to the field of computerized information search and retrieval systems and, more particularly, to a method and apparatus for comparing database search results.
Information is increasingly being represented as digital bits of data and stored within electronic databases. These databases often include extremely large numbers of records containing data fields reflecting an endless variety of objects. Some databases, for example, contain the full text of judicial opinions issued by every court in the United States for the past one hundred and fifty years. Other databases may be filled with data fields containing particularized information about vast numbers of individuals (e.g., names, addresses, telephone numbers, etc.). As more information is stored in these databases, the larger these data compilations become.
Among the many advantages associated with electronic storage is the fact that any given database can be searched for the purpose of retrieving individual data records (e.g., documents) that may be of particular interest to the user. One of the ways to perform this search is to simply determine which data records, if any, contain a certain keyword. This determination is accomplished by comparing the keyword with each record in the database and assessing whether the keyword is present or absent. In addition, database users can search for data records that contain a variety of keyword combinations (e.g., xe2x80x9ccatsxe2x80x9d and xe2x80x9cdogsxe2x80x9d, etc.). This operation, known as a Boolean search, uses the conjunctions xe2x80x9cANDxe2x80x9d, xe2x80x9cORxe2x80x9d, and xe2x80x9cNOTxe2x80x9d (among others) to join keywords in an effort to more precisely define and/or simplify the database search. For example, if a user joins the keywords xe2x80x9ccatsxe2x80x9d and xe2x80x9cdogsxe2x80x9d with the conjunction xe2x80x9cANDxe2x80x9d and inputs the query xe2x80x9ccats AND dogsxe2x80x9d, only those records that contain both the term xe2x80x9ccatsxe2x80x9d and the term xe2x80x9cdogsxe2x80x9d will be retrieved.
The problem with this Boolean search however, is that a computer typically makes use of substantial memory space and computing time to perform logical combinations of sets of documents corresponding to the keyword search results. It is therefore desireable to create a system that performs logical combinations on set elements that is space and computation time efficient.
It is an object of the present invention to analyze data records in a database.
It is a further object of the present invention to analyze data records in a database by efficiently representing the results of element tests against the database.
It is another object of the present invention to analyze data records in a database by efficiently combining the results of element tests against the database.
It is still a further object of the present invention to analyze data records in a database by efficiently representing the results of keyword tests against the database.
It is still a further object of the present invention to analyze data records in a database by efficiently combining the results of keyword tests against the database.
It is still a further object of the present invention to analyze data records in a database by efficiently representing the results of field type tests against the database.
It is still a further object of the present invention to analyze data records in a database by efficiently combining the results of field type tests against the database.
The present invention provides a method and apparatus for analyzing a database. This analysis is achieved by representing the subdocument lists of an inverted database with encoded bit strings. The encoded bit strings are space efficient methods of storing the correspondence between terms in the database and their occurrence in subdocuments. Logical combinations of these bit strings are then obtained by identifying the intersection, union, and/or inversion of a plurality of the bit strings. Since keywords for a database search can be identified by selecting the terms of the inverted database, the logical combinations of bit strings represent search results over the database. This technique for generating a search result is computationally efficient because computers combine bit strings very efficiently. The search elements of the present invention are not just limited to keywords. The search elements could also involve types of fields (e.g., date or integer fields) or other extracted entities. These and other aspects and advantages of the present invention will become better understood with reference to the following description, drawings, and appended claims.