Searching large collections of text based information items, such as archives of publications or commercial listings such as telephony yellow pages, for entries related to a particular subject can be time consuming and frustrating. Accordingly, such large collections have been incorporated into data bases and various computerized searching strategies (denoted herein "search engines") have been developed to assist a user in searching for desired information items. However, many times the search strategy provided does not easily satisfy a user's need to locate the desired information items quickly and simply. For example, a large number of such search strategies or search engines are based on character string matching algorithms and typically require the user to specify a potentially complex Boolean expression of character strings so that such a search engine can match with corresponding character strings in stored information items and subsequently evaluate the Boolean expression to determine whether a particular information item is to be provided to the user. Note that such matching search engines (denoted herein "literal search engines") may provide a reasonably concise searching capability for sophisticated users to retrieve a desired information item, or alternatively to determine that such an information item does not exist. However, it is desirable to utilize such search engines with relatively simple expressions and still provide the user with the accuracy obtained from more sophisticated search expressions.
Alternatively, text based search engines have also been developed wherein the search engines attempt to search information items based on the content or semantics of the information items. These search engines have a much more difficult task to perform in that it has been non-trivial to computationally determine the semantics of information items effectively for a wide range of users. Accordingly, various techniques have been employed in such search engines including various statistical techniques for relating the content of one stored information item to the content of one or more other stored information items. Accordingly, an important characteristic of such content based search engines (hereinafter also referred to as semantic similarity search engines) is that the relating of the information items to one another allows for retrieval of information items that are related to the terms in a user request even though the retrieved information items may have none of the user input terms. For example, a user requesting information regarding "green wool sweaters" might obtain an information item related to "emerald cashmere pullovers." Thus, as this example illustrates, a user may be presented with unexpected or undesired information items as well as more relevant information items. However, this type of search engine offers a significant advantage in that typically, the expressions input by a user are simple lists of words and/or phrases. Thus, these search engines have been of particular benefit to unsophisticated users.
Since each of the above types of search engines retrieve information items in substantially different ways, a synergistic combining of such search engines into a single information retrieval system would be advantageous. In particular, it would be advantageous to have such an information retrieval system wherein relatively simple user information requests may be provided and subsequently a ranked listing of information items is output to the user, wherein this listing has both the narrowness of focus provided by a literal search engine, and also provides the user with information items having a related content that also appears to be relevant to the user.