1. Field Of the Invention
This invention relates generally to the field of information retrieval systems and methods, and more particularly, to methods for combining multiple queries through an abstract programming interface.
2. Background of the Invention
Information retrieval systems typically include a database of records, a processor for executing searches on the records, and specifically adapted application software, such as a database management system, for accepting search queries, managing the processor, and handling the search results. In general, the database can include information such as text documents, financial records, medical files, personnel records, technical documentation, graphical data, or various combinations of such items. In order to effectively search and retrieve desired items, the search application typically supports a limited number of query models, or search operations, specifically designed to operate on the underlying data types in the database. For example, a typical document database, such as a database of news publications, may be organized with each news article as a record, with fields for publication date, author, title, industry category, and body text. A simple search application may then support full text searching for all text fields, individual field searching, such as searching by the date or author fields, and various boolean search operations, such as conjunction, disjunction, and negation. A more sophisticated search application may also support proximity based searching, allowing a user to locate word tuples having specified proximities. This would allow a user, for example, to locate in such a news database, all articles having the words "Clinton" within 25 words of "foreign policy." Thus, proximity searching is specific form of boolean conjunction.
One of the limitation of existing information retrieval systems is the difficulty in combining different query models in a single search. For example, a simple system may allow a user to perform either full text searching, or field based search, but not both in a single query. A user interested in retrieving documents having certain keywords would be unable to simultaneously constrain the documents to those of given date, author, or the like. More robust systems may offer only a limited mixing of field based searching and full text searching, but do not support full integration of non-field based queries. This limitation stems primarily from the query architecture used by the software vendor, with the various different query models having incompatible operations or algorithms.
The difficulty in combining multiple query models generally results from the non-extensible query architecture that underlies the information retrieval system. In conventional information retrieval systems, the various query models that are supported, such as text based searching, boolean searching, and the like, are normally implemented with implementation specific code, designed for the specific data types and operations available in the information retrieval system. The software vendor does not provide any capability for an extensible architecture for the search operations or data types available in the system. This is generally because of the implementation specific storage and performance constraints that the vendor has designed into the information retrieval system. Thus, an applications programmer utilizing the information retrieval system will typically be constrained to using search operations that the vendor has provided, and will be unable to add new search operations for new query models or data types. Continuing the prior example of the database of news publications where the software vendor has provided only full text, field, and boolean query operations, an applications programmer would typically be unable to add search operations that retrieved documents based on statistical information, such as the number of references to a given word or set of words, or the frequency of specific references in arbitrary subsets of the database.
Another limitation of existing information retrieval systems is that the search algorithms are designed to execute over a substantial portion of the database, returning their results in memory intensive arrays or similar structures. Typically, the software vendor defines the computational design of each search operation, preventing the applications developer from designing more efficient algorithms for implementing a giver search operation. This results in limitations on the performance an information retrieval system can deliver.
Accordingly, it is desirable to provide a query architecture for an information retrieval system that is extensible and allows for the efficient integration of new query models designed for an open variety of data types and formats. This would allow the information retrieval system to support any arbitrary combination of query models, and further allow the applications programmer to add new query models and data types to the system as needed or desirable. A desirable query architecture should place minimal constraints on the necessary storage and performance needed to perform search operations. This would allow the applications programmer to design the information retrieval system to perform efficiently on a variety of operating platforms.