The present application relates generally to Web-based information searching and retrieval, and more particularly to a system and method performing a comparative web search providing a meta-search engine search, web snippet processing and meta-clustering, comparison analysis, quantitative evaluation of web snippet quality, search result summarization, and information visualization.
Generic web-based search engines such as, for example, Google®, Yahoo® and AlltheWeb® are independent systems that index billions of web pages. When a user submits a search request, it is often the case that the search engines return large and unmanageable volumes of results, many of which are irrelevant or not useful to what the user is seeking. In addition, such search engines use some ranking algorithm to order the search results. However, certain more relevant results might be ranked lower than certain less relevant or even irrelevant ones.
The human compiled web directories such as OpenDirectory®, Yahoo Directory®, and LookSmart® often return high-quality documents with high relevance to the queries. A hierarchical directory path is associated with every result, which helps the user in understanding the placement of the query in the category tree. However, the scope of web directories is very limited and they often fail to generate quality results for very specialized queries or very general queries. In addition, such web directories typically do not index the websites periodically. Like generic search engines, web directories also employ the ranked listing paradigm to present their output which often prolongs the process of locating desired result(s).
Even the most sophisticated generic search engines find it challenging to keep up with changes occurring in the super dynamic environment of WWW. Utilizing two or more search engines may better meet the information needs and provide a comprehensive set of search results. Multi-search engines such as Multi-Search-Engine.Com®, and Searchthingy.Com® bridge this information gap by assimilating results from different search engines and presenting them to the user in separate windows. Moreover, multi-search engines can recommend a search engine to a user he/she may not have considered. Though multi-search engines may satisfy the needs of a comprehensive search, they do not simplify the task of locating a desired document.
Like multi-search engines, meta-search engines also bridge the information gap by assimilating results from various search engines. However, they refine the accumulated search results before presenting them to the user. Meta-search engines can be broadly grouped into three main categories in terms of result presentation: Meta-Search Engines with Ranked Listings of Results; Meta-Clustering Search Engines; and Other Visualization Tools. Each of these categories is addressed hereafter.
Meta-Search Engines with Ranked Listings of Results. This class of meta-search engine reduces the length of search results by removing duplicate results. Examples of such systems include Dogpile®, Metacrawler®, and Mamma® search engines. Moreover, these systems eliminate the heuristic bias by re-ranking the results based on complex weighting algorithms. Accordingly, the result set of a meta-search engine can be more informative and diverse as compared to the result set of any one search engine. However, even in this case where results are presented in ranked lists, the process of locating desired document(s) from the ranked lists is often time consuming for the user.
Meta-Clustering Search Engines.
In addition to duplicate removal, meta-search engines such as, for example, Grouper®, Vivisimo®, SnakeT®, and iBoogie® condense the search result presentation space by several orders of magnitude by organizing their results in hierarchical clusters. Despite their state-of-the-art technology, present meta-clustering search engines are strictly single query processing systems and do not handle more than one query request from a user at a time.
Other Visualization Tools.
In recent years, the Internet has seen the advent of several innovative visual tools that have broadened the scope of search engines beyond information retrieval. Three diverse visualization tools, which represent this class of meta-search engines are Kartoo®, WebBrain®, and WhatsOnWeb (WOW)®. Each of these meta-search engines is addressed hereafter.
Kartoo® is a meta-search engine which offers cartographic visualization of query results. Despite its suitable visualization interface, Kartoo lacks the ability to display relationships between clusters. In addition, since Kartoo users can browse through only one map at a time, they need to explore all the maps generated in response to a query to determine the complete set of thematic relationships between all the documents in the result set. Also, a typical map generated by Kartoo consists of only 10-12 documents making the task of locating desired document equally challenging to the user as in the case of ranked lists.
WebBrain® is a special meta-search engine that queries only one database—the Open Directory® (OD). WebBrain is a useful tool to visualize related information on the Web but it suffers from the limitations of its underlying Web directory. Since OD's database is much smaller than the databases of generic search engines, queries on specific topics often generate poor or no results on WebBrain.
WhatsOnWeb (WOW)® dynamically generates hierarchical clusters based on a topology-driven approach. The clusters are presented using a snippet graph that illustrates the semantic connections between clusters. It appears that this visualization standard overcomes the limitations of Kartoo®. Moreover, WOW queries more than one database to collect its data, which makes it a more versatile tool than WebBrain. However, the response time of WOW is much longer than other meta-search engines due to the high complexity [O((n^2) m+(n^3) log n)] of the topology-driven approach. Here, ‘n’ is the number of unique documents in the result set of a query and ‘m’ is the total number of semantic connections between all documents. In addition, the topographic overview of the results may not be easily comprehensible to average users.
The key motivating factor behind all search engines discussed above, with an exception of a few visualization tools, is information retrieval (IR). Even those visualization tools that are driven by knowledge discovery and employ innovative ways to synergize IR and knowledge representation, have yet to explore the territory of comparison analysis of two or more query result sets for knowledge extraction. Currently, there appears to be a limited number of search engines that perform comparison analysis such as, for example, SnakeT® and Thumbshots®. However, these noted search engines only perform object-level, rank comparison analysis. Moreover, their rudimentary interfaces do not simplify the task of locating desired documents and do not help avoid reading of repeated materials.