As both the Web and users' expectations have matured, information needs have shifted from short and simple keyword-based queries to complex and detailed search tasks. Sometimes these complex needs can be expressed by simple queries such as “cancer treatments.” In such cases, the articulation of the need is simple whereas the relevant information satisfying the need is detailed, complex and perhaps requires interaction between the searcher and the system. Other times, the information need can be complex and could only be expressed in terms of long (perhaps multiple) questions including various details and constraints. Today's search engines excel at satisfying short, keyword-based queries but they are still far from perfect in supporting long, question-type search tasks. Often, if a user submits a lengthy multiple-sentence query to a search engine, that search engine will not return any results at all, because the search engine only returns documents that contain all of the words in the query. Users therefore have become accustomed to submitting short queries to search engines even in spite of those users' actual desires to provide more detailed questions that would shed more light on the kind of information that those users truly seek.
Question-answering portals such as Wiki Answers, Baidu Knows and Yahoo! Answers have emerged as a medium for posting questions and relying on a network of users to retrieve answers. Despite their popularity, these services are not always effective in terms of finding an answer or obtaining relevant information relating to the posed question or information need. The quality of answers varies significantly and there are many questions unanswered or answered without satisfaction. Many times, questions submitted by users to these types of services would be sufficiently answered only by experts in the fields to which the questions pertain, and such experts often do not interact with these types of services. All too often, questions submitted by users to these types of services end up being answered incompletely or incorrectly by amateurs who lack any real qualifications to answer those questions. Even if such amateurs are able to provide a correct answer to the questions, their unsophisticated answers often will not contain much, if any, information beyond what the question asker already knew.
Furthermore, these portals usually recommend that the askers search for similar questions in their database before posting a new question, in the hope of reducing redundancy and increasing answer availability. However, these portals lack a reliable search mechanism which can match two similar detailed questions with high accuracy. Instead, these portals allow keyword-based, context-unaware search which favors recall over precision in retrieving similar questions. It is often difficult for users to express complex information needs in just a few keywords.
There have been other attempts to address complex information needs. One approach formulates templated queries that could represent complex information needs. The benefit of templates is that they provide expressive power and ease of representation. The disadvantage is that only the needs that can be conveyed through the available templates can be satisfied. This is a serious limitation compared to the flexible, completely unrestricted domain of complex needs that people have.
Current search engines are tailored well for retrieving relevant results to queries of length 2 to 3 words on average. Yet, people have increasingly been using the Web for more complex needs. One Internet monitoring company recently announced that the average length of search queries has significantly increased during a single year based on a large sample of Internet users. There is an increasing and continuing trend in users to use the Web to seek relevant information on questions/tasks/needs that cannot be articulated well with 2 to 4 words. Longer queries allow the user to express his needs easier by providing far richer context.
However, existing retrieval methods do not perform well on such queries. Users tend to click lower in a result list for longer queries than shorter ones. The average click position is even higher for questions than other types of long queries including queries with Boolean operators and composites. Assuming a direct relation between the reciprocal rank metric and the retrieval effectiveness, retrieval systems perform poorly in satisfying long, complex information needs.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.