The World wide web has dramatically changed the requirements from information retrieval engines such as Oracle Text of the Oracle Corporation. Recent research shows that web users rarely look beyond the first two pages from a candidate hitlist with a total of twenty hits. Furthermore, users expect subsecond response time (regardless of the promised accuracy of the results). With these types of expectations, response time is of paramount importance. At the same time, since typical web users are not trained in information retrieval, it is imperative that search applications provide very forgiving syntax (or free text query) and deliver a reasonable hitlist.
A solution provided by Oracle Text is the ABOUT operator which accepts short free text queries and finds relevant documents using Oracle Text knowledge based linguistic retrieval system. The ABOUT operator internally uses ACCUMULATE (ACCUM) operator to rank queries with multiple nonstop words (stopwords are words like ‘is’, ‘am’, ‘are’, ‘when’ etc.). The response time and relevance ranking of the ABOUT query depends on the effectiveness of the ACCUM operator. One problem is slow response times for queries involving a few non-stopwords and unpredictable non-intuitive relevance rankings for queries involving more than one non-stopword. Both of these problems are attributable to the ACCUM operator scoring semantics.
In other prior information retrieval and ranking systems, even when a user is interested in only a few most relevant documents, the ranking system has to retrieve and evaluate an exact relevance score for every single candidate document identified by the search. In a query, the presence of a single non-restrictive term forces the system to evaluate an exact relevance score for an extremely high number of documents. This is required because the prior systems can not identify the most relevant documents until the scores for all the documents were computed. The reason for this problem is that there is no necessary relation between the final score range of a document and the number of children or total weight matched.
The present invention provides a new and useful method and system that optimizes the response time and relevance rankings for search queries that cures the above problems and others.