Information retrieval systems have developed specialized data structures and algorithms to perform a specific task: ranked retrieval of documents. These systems are increasingly being called upon to incorporate more complex processing into query evaluation. Some extensions, such as query expansion for instance, may be handled using the existing information retrieval systems. Other extensions, such as static scoring, may be incorporated by making changes to the underlying system. But an increasingly prominent set of desired extensions do not naturally fit within the traditional search and retrieval systems used to query a collection of documents, and are typically addressed through post-processing of standard result lists. Although functional, implementation of such desired extensions to traditional search and retrieval systems have unfortunately resulted in somewhat of a kludge.
For example, consider a person on a business trip who may enter a query, “deep dish pizza in Palo Alto” through a web browser interface to a search engine. A typical search engine may find certain pages that contain an exact or partial match to the string “deep dish pizza.” A search engine may also find some documents in the system have been labeled as restaurants, and of those, some may have also been labeled more specifically as pizza restaurants. Furthermore, the person may be a recognized user who may be a member of a social network in which people indicate web sites of organizations or establishments they endorse.
One strategy for using an existing information retrieval system to process the query would be to use an inverted text index to obtain documents relevant to “deep dish pizza,” and then perform post-process using the social network and geographical data. However, text matching may not represent the most selective access path, especially if relaxed matching semantics may be employed. Moreover, other metadata may not offer efficient random access to proximity information, such as in an extreme case where the search term may be very broad, but the metadata may be highly selective. The approach of first scanning the results of the search query and then post-processing by making calls to a separate metadata engine could potentially result in millions of accesses to process a relatively straightforward query. Hence this strategy may not always perform well.
What is needed is a novel framework that may more comprehensively extend the traditional information retrieval framework to more naturally accommodate the growing number of desired extensions for information retrieval. Such a system and method should to consider a broader space of evaluation strategies by allowing generalization along one or more dimensions, yet perform well and ensure sufficient result cardinality.