The present invention relates to information retrieval. More particularly, the present invention relates to information retrieval using dynamic guided navigation.
Information retrieval from large sets of electronic documents, such as web pages, can be achieved by searching. Often the information desired is not the documents themselves but the content in the documents. Users typically enter search queries into a search engine and then review the search results to extract the desired content. Not all users, however, know beforehand what they are searching for. Hence, searches can run the spectrum from directed searches to pure exploratory type of searches.
With directed searches, users already know what they are searching for and can formulate the search queries. For example, a user wants to know about product feature X. The user formulates a search query that includes terms such as the product name and the feature X. With exploratory searches, users may have a general subject area in mind but do not know enough about the subject area to intelligently formulate focused search queries and/or review the search results. For example, a user wants to find out interesting aspects of a product Y. However, the user knows little or nothing about aspects of product Y. Thus, the user's search query may be limited to “product Y.” Such query will return a large number of documents. Not only is the large set of search result impractical to read, but even reading through the documents, it may not be clear what aspects or features of product Y are relevant.
To aid users conducting exploratory searches, some search engines provide recommendations of narrower search queries. The recommendations are generated by mining query logs from a community of users and extracting the most frequent queries that included the current user's entered query plus at least one other query term. For example, if many people search for “golf courses,” then when the current user searches for “golf,” one of the recommendations may be “golf courses.” Although this approach draws from the knowledge of a community of users, the recommendations do not take into account the content of the corpus of documents that are being searched.
One way to make general or web searching, e.g., searching within all of the documents within the web space, more manageable is to divide the web space into sub-spaces based on the document type. Product review space is an example of a sub-space based on web sites or documents that contain product reviews. These web sites explicitly asked users to submit reviews of particular products, the review typically including a numerical ranking of the particular products.
When a user is interested in buying a digital camera, for example, he or she can look through product reviews of digital cameras to find out which particular digital camera is best suited for him. But the user is not familiar with digital cameras and does not know what makes one camera better or worse than other cameras. Thus, he is unable to formulate a direct query to find relevant reviews, such as reviews that discuss relevant features of digital cameras. Instead, the user formulates an exploratory query and is confronted with a thousand reviews of digital camera. Reading through the thousand reviews would be impractical. Instead, the user would benefit from quick navigation guidance to the most relevant reviews, e.g., only those reviews that cover the digital camera features likely to be of interest to the user.
Even if the reviews of digital cameras are sorted by numerical rankings included in the reviews, e.g., from highest to lowest rankings to surface particular digital cameras that are highest ranked, numerical rankings fail to sufficiently differentiate and identify subtleties in selecting a digital camera. For one thing, numerical rankings tend to cluster within a very narrow range. For another, numerical rankings do not take into account the substance of the reviewers' comments or opinions of why they liked or disliked a product.
Alternatively, even if a web site asks a user to self categorize, e.g., between a novice, intermediate, or expert, in order to suggest a preset (or preselected) list of features or topics for further exploration, such a preset list is not dynamic. All users who select the same category are presented the same preset list for further exploration. The preselected list is also typically not reflective of the documents contents and may merely reflect a subset of what users are talking about.
Thus, it would be beneficial to anticipate the dimensionality of the data organization for domains where exploratory searches may be common. It would be beneficial to pre-organize the data to serve as a broad summary of the corpus even before a search query is entered. It would be beneficial to provide users navigational guides to quickly access the data that they are actually interested in but unable to articulate due to lack of subject matter knowledge. It would be beneficial to incorporate past user sessions data to evolve the organization of the data and/or ranking of documents over time. It would be beneficial to cluster the organized data by predefined categories to provide targeted advertisement. It would be beneficial to cluster categories that are related to one another (because users tend to explore such categories together) to help categorize users and target advertising.