Information needs of users in the digital era can be fulfilled by keyword-based search engines. Such search engines have become the universal catalogs for world-wide resources. Unlike the old library catalogs that are mostly searchable by fixed fields (e.g., by authors, titles, and keywords predefined by authors), modern Web search engines provide a flexible, easy way to express search terms. However, the search results are typically long lists of hits that contain many irrelevant links. Radev, D. R., et al., WebInEssence: A Personalized Web-Based Multi-Document Summarization and Recommendation System. In: NAACL Workshop on Automatic Summarization. Pittsburgh, Pa. (2001).
Past research has concentrated either on refining the search keywords or on sifting and filtering the search results, to improve the precision of the returned hit lists. Search engines face an additional complication when a search term is a homonym (a keyword with multiple meanings or multiple references) and the user is not aware that there are several concepts for this term. She might not be aware of this homonymy at all, or it might escape her attention at the moment of performing the Web search. For example, when looking for information about former President George W. Bush she might momentarily forget about President George H. W. Bush, the father of President George W. Bush. She would then get results about both of them, which is not what she desired.
When using a search engine to satisfy an information need about a homonymous concept, a user is faced with two kinds of problems. She might get an overwhelming number of responses about one homonym, especially if this meaning is more popular, while the second homonym with a less popular meaning that she might be really interested in is hidden in a snippet on a much later page of hits returned by the search engine. This is the case with lopsided preferences in meanings. For instance, the “Michael Jackson” who is a singer is much more popular than the basketball player of the same name. Hence many more search results contain references to the singer. In this situation, the user is at least aware that the results she is getting are not about the basketball player that she has been looking for. When formulating the initial query, it escaped her attention that there are two concepts for her search term and that more information might be available on the Web about the homonym that she is not interested in. At this point, she needs to wade through pages of reported hits for the wrong Michael Jackson or append terms to her query that will exclude the unwanted homonym and re-execute the search. This constitutes a kind of feedback loop between the user and the search engine.
The situation is even worse if the user is completely unaware of the fact that the search term is a homonym with two (or more) references, and all results that appear on the first few pages of hits are to the “wrong” reference. For example, a user located in the New York area, who types “Penn Station” into Google® will see many references to Penn Station in New York City (NYC) and some references to Penn Station in Newark. These two Penn Stations are separated by a 20 minute train ride. Unbeknownst to her, there is also a Penn Station in Philadelphia, Pa. However, a reference to the latter does not appear on the first page of search results.
In a previous ontology-supported Web search systems, the user was presented with a number of choices of additional search terms for her input. She could mark such terms as positive, i.e., they should be included in the Web search results, by clicking on associated check boxes. One problem with this approach was that users did not want to be bothered by many questions. A more benign approach to eliciting additional information from a user can be seen in the use of suggested completions. While a user types in the first (few) word(s) of her search, the search engine displays up to ten suggested search completions, which will possibly describe the search that the user had in mind. These completions are presumably based on the observed frequencies of many searches of other search engine users. While the user continues to type, the suggested completions change rapidly and are often limited to fewer than ten.
Another weakness of the aforementioned Web search system was that it did not make use of the information that may be inferred by a form of closed-world assumption from the terms that the user did not select with a check mark. According to the documentation of major search engines, the use of negative search words, marked with a minus sign before the word(s), constitutes a particularly powerful tool for discriminating between different results.
Current popular search engines do not reflect distinctions between different concepts that are expressed by the same word or the same multi-word term (homonyms). Suggested completions also do not appear to be optimized for discrimination between homonyms. These suggested completions are disorganized from a conceptual point of view.