As the Internet has evolved over the years, there have been a myriad of ideas and schemes used to facilitate information retrieval. The amount of information on the web is growing rapidly, as well as the number of new users who are inexperienced in the art of web research. Increasingly, information gathering and retrieval services are faced with a market full of users that want to be able to search for very specific information, as quickly as possible, and without being burdened with false positives.
Users are likely to navigate the web using human maintained indices, such as YAHOO! and the online yellow-pages, or search engines such as GOOGLE. Human maintained indices cover popular topics effectively; however, they are subjective, expensive to build and maintain, slow to improve, and cannot cover all esoteric topics. Such lists generally group information by predetermined categories. For instance, the online yellow pages organizes its listings by a standard industry code (SIC) scheme. YAHOO! also is based on a taxonomy structure but provides a class-generalization hierarchy of categories to support more sophisticated browsing.
Although human intelligence is used during the classification process for such indexed schemes, this classification process still suffers drawbacks. For example, the quality of web content classification is often skewed as a result of individual reviewer bias. Also, the growth of web content has made it virtually impossible to maintain an up-to-date database of classified web content. The predetermined categories that were once effective to classify information may become stale within a short period of time.
Instead of using indices services, a user can retrieve information using search engines. Search engines, such as GOOGLE, allow a user to enter a query and will return a set of results based on the text from that query. When a query is initiated, the returned set of search results is displayed on one or more search web pages with the search result “hits” arranged in a ranked order. The methodologies that are used to select the hits and rank the search results from most relevant to least relevant vary from search engine to search engine. As a result, performing an identical query on two different search engines rarely, if ever, yields the same set of search results. Even if an identical set of search results is returned, the order in which the search result hits are presented will vary.
The methodologies used by the search engines to determine hits also typically yield search results that include irrelevant hits. For example, if a user looking for “house plans” initiates a search using a web-based search engine, the set of search results may include hits relating to “dog house plans”, “bird house plans”, and/or hits discussing “budget plans for the white house”. In some cases, the majority of the hits will be in the same category and relevant to the search query. Unfortunately, in other cases, very few of the hits will be in the same category and many of the hits will be irrelevant to the search query. This, of course, makes searching frustrating for users.
To address these issues, increasingly the trend is to incorporate a clustering algorithm that clusters search results by grouping certain hits together. Examples of search engines that perform hit clustering include Teoma and Fast. Using such automated clustering techniques is not surprising, given the large number of hits many search queries return, because reliance on people resources to classify billions of web pages into groups is impractical. Unfortunately, these automated computer-driven clustering technologies are rudimentary and prone to error, since no human intelligence is applied to assign context to the search query.
Consumers, for example, want to input minimal information as search criteria and in response, they want specific, targeted and relevant information. Being able to match a consumer's query to a proper business name is very valuable as it can drive a transaction (e.g., a sale). Accommodating these demands effectively unfortunately requires human intelligence, which is not easily captured into a search engine or index scheme without investing in an involved and expensive process. The difficulties of this process are compounded by the unique challenges that companies face to make their presence known to consumers on the internet.
Thus, one of the most complicated aspects of developing an information gathering and retrieval model is finding a scheme in which the cost benefit analysis accommodates all participants, e.g., the users, the businesses, and the search engine providers. At this time, the currently available schemes do not provide a user-friendly, provider-friendly and financially-effective solution to provide easy and quick access to specific information.