Many search engine services, such as Google and Live Search, provide for searching for information that is accessible via the Internet. These search engine services allow users to search for display pages, such as web pages, that may be of interest to users. After a user submits a search request (i.e., a query) that includes search terms, the search engine service identifies web pages that may be related to those search terms. To quickly identify related web pages, the search engine services may maintain a mapping of keywords to web pages. This mapping may be generated by “crawling” the web (i.e., the World Wide Web) to identify the keywords of each web page. To crawl the web, a search engine service may use a list of root web pages to identify all web pages that are accessible through those root web pages. The keywords of any particular web page can be identified using various well-known information retrieval techniques, such as identifying the words of a headline, the words supplied in the metadata of the web page, the words that are highlighted, and so on. The search engine service may generate a relevance score to indicate how relevant the information of the web page may be to the search request based on the closeness of each match, web page importance or popularity (e.g., Google's PageRank), and so on. The search engine service then displays to the user links to those web pages in an order that is based on a ranking that may be determined by their relevance, popularity, or some other measure.
Some online services, such as Yahoo! Answers and Live QnA, have created large collections of questions and their corresponding answers. These Q&A services may provide traditional frequently asked question (“FAQ”) services or may provide community-based services in which members of the community contribute both questions and answers to those questions. These Q&A services provide a mechanism that allows users to search for previously generated answers to previously posed questions. These Q&A services typically input a queried question from a user, identify questions of the collection that relate to the queried question (i.e., a question search), and return the answers to the identified questions as the answer to the queried question.
Such Q&A services typically treat the questions as plain text. The Q&A services may use various techniques including a vector space model and a language model when performing a question search. Table 1 illustrates example results of a question search for a queried question.
TABLE 1Queried Question:Q1: Any cool clubs in Berlin or Hamburg?Expected QuestionQ2: What are the best/most fun clubs in Berlin?Not Expected Question:Q3: Any nice hotels in Berlin or Hamburg?Q4: How long does it take to get to Hamburg from Berlin?Q5: Cheap hotels in Berlin?Such Q&A services may identify questions Q2, Q3, Q4, and Q5 as being related to queried question Q1. The Q&A services typically cannot determine, however, which identified question is most related to the queried question. In this example, question Q2 is most closely related to queried question Q1. The Q&A services nevertheless provide a ranking of the relatedness of the identified questions to the queried questions. Such a ranking may represent the queried question and each identified question as a feature vector of keywords. The relatedness of an identified question to the queried question is based on the closeness of their feature vectors. The closeness of the feature vectors may be determined using, for example, a cosine similarity metric.
The Q&A services typically display the identified questions to a user in rank order. A difficulty with such displaying of the identified questions is that many of the highest ranking questions may be very similar in both syntax and semantics. For example, the identified questions for the example of Table 1 may also include the additional questions of Table 2.
TABLE 2Q6: Fun clubs in Berlin or Hamburg?Q7: What's a good restaurant in Hamburg or Berlin?Because questions Q2 and Q6 have several words in common with queried question Q1, a Q&A service may rank those questions high. Depending on the size of the collection of questions, there may be many questions similar to questions Q2 and Q6. If all these similar questions are ranked high, then the first page of the search results may list only such similar questions. If the user is actually interested in hotels that have health clubs, then the user may need to scan several pages before finding a listing for a hotel or a hotel with a health club that is of interest.