The present invention relates generally to a method and system for presenting a visual representation of information on a computer system. More particularly, the present invention provides a method and system for effective human visual comprehension of search results relating to information available on the Internet.
With the development of the modem computer system, a massive volume of information may be readily stored, accessed and analyzed. Improvements in mass information storage media and development of new types of mass storage media now allow individuals to access millions of pages of information. Through networks of computers, an individual can further increase the volume of accessible information millions of times.
The Internet, rapidly increasing in popularity and usage, allows millions of users around the world access to the information on a network of millions of computers. The World Wide Web of the Internet is one of the most popular sources of Internet information and represents a collection of hundreds of millions of pages of information stored on millions of computers throughout the world. With low-cost and user-friendly software, even novice computer users are able to easily organize their own set of information which can be accessible by others over the World Wide Web of the Internet. Many of the pages of information on the Internet also include addresses or xe2x80x9clinksxe2x80x9d to other pages of information that may be readily accessed upon a single and simple click of a button, which could interconnect the user with information residing on a computer system on the other side of the world, or maybe even just next door. Although presently most of the information available on the Internet is textual in nature, accessible also are massive amounts of information in which digitized images, sounds and video are stored.
Never in the history of man has there been a time when such a massive volume of information could be readily accessed by individual users. On the Internet, this information can be accessed at virtually no or little cost. However, without tools to assist users to efficiently search and comprehend the information available, this great resource of information may be largely wasted. Attempting to locate a particular subset of information of interest can be an overwhelmingly time-consuming and frustrating task. No common system of organization is followed for information available on the Internet and, since it is assembled by a diverse set of users all with different backgrounds and views, it seems unlikely that Internet information will ever be regulated by information providers for effective organization.
Recognizing the need for allowing users to search through the massive amount of information available on the Internet, a variety of commercial Internet information indexing services have been implemented, such as Yahoo, Lycos, InfoSeek, Excite and Alta Vista. Although the search algorithms and techniques used (also referred to as xe2x80x9csearch enginesxe2x80x9d) are likely quite different for each of such services, the format of the information presented to a user located as a result of a search are relatively similar. Conventional search engines, after conducting the search, present the user with a list of xe2x80x9chits,xe2x80x9d each of which typically provides an address or link to allow a user to access the information and perhaps a short description of the information available. In many instances, a user will be presented with a list of thousands, hundreds of thousands or possibly even millions of xe2x80x9chitsxe2x80x9d with perhaps only a few of interest to the user. Due to the nature of the search conducted, many conventional search engines provide results that include duplicates, which further frustrates the ability of the user to efficiently locate the information of interest. For example, a search engine may attempt to search the Internet for sites that include the term xe2x80x9ccuisinexe2x80x9d. However, if the term is used multiple times in an available document, many conventional search engines will include a xe2x80x9chitxe2x80x9d for each occurrence of the term within a document.
In order to assist a user with navigation through the list of xe2x80x9chits,xe2x80x9d some conventional search engines attempt to rank the hits using a predetermined algorithm in an effort to estimate the relevance of each of the hits. The hits that are determined to be most relevant are placed at the top of the list, while those that are determined to be least relevant are placed at the bottom of the list. Although such a relevance ranking is sometimes useful, in practice, a user is frequently presented with many duplicate hits referring to the same information and the relevance ranking is not sophisticated so many of the hits are classified with the same ranking, presenting the user with a list of perhaps thousands of hits all of which are indicated as being of equal relevance.
These prior art techniques for searching and presenting the search results are not effective in conveying information to the user in a manner that allows the user to efficiently comprehend the volume, relevance and organization of the information located from a search. This is particularly true with respect to a search of information available on the Internet, an information source of almost unimaginable volume and inconsistent organization.
U.S. Pat. No. 5,870,740 to Rose et al. assigned to Apple Computer, Inc., discloses a method and system for improving the ranking of information retrieval results for short queries. Techniques such as this are directed to providing a more sophisticated technique for estimating the relevance of information located from a search. The Rose et al. invention attempts to improve estimated relevance determination by taking into account the degree of overlap between query terms and the terms of a located document, the query length and a xe2x80x9cboostingxe2x80x9d factor. Although the Rose et al. invention might improve the estimated relevancy ranking to some extent, it fails to recognize other potentially important factors of relevancy, and does not provide a suitable solution to allow a user to efficiently comprehend the results of a broad search that returned many hits. In other words, as illustrated in FIG. 5, the Rose et al. invention is representative of prior art techniques that present search result information in linear lists, a format that is particularly easy for a computer to generate but is ineffective in assisting a user in navigating and understanding the results.
A system has been proposed by Jeromy Carriere and Rick Kazman in a paper entitled xe2x80x9cWebquery: Searching and Visualizing the Web through Connectivity.xe2x80x9d Although this paper suggests that search results may be presented visually, it does not propose a technique that may be practically implemented for the massive volume of information available from the Internet. Carriere and Kazman propose a xe2x80x9cspiderxe2x80x9d (a program that xe2x80x9cvisitsxe2x80x9d each document or xe2x80x9csitexe2x80x9d within the search space where information may be located) to precompile a database containing information regarding the connectivity of data in the search space. This connectivity database is then consulted as a factor affecting relevancy of search results. The proposed technique was implemented with respect to the Internet World Wide Web sites of the University of Waterloo, a very small subset of the entire World Wide Web. The technique required the xe2x80x9cspiderxe2x80x9d to collect information from the 200,000 sites at the University of Waterloo. Attempting to construct and maintain such a precompiled database of connectivity information is unmanageable for the entire World Wide Web because the dynamic nature of the information available and the sheer volume of information would make it impossible to complete the precompilation task for all the information available on the Internet.
The present invention provides a system and method that may be readily implemented for the entire volume of information available on the Internet for efficient searching for information and a visual presentation of search results that may be quickly comprehended by a user. The present invention further allows a variety of functions to facilitate user navigation of the search results, as well as user customization, modification and organization of the search results.
One embodiment of the invention provides for the search to be conducted by any conventional search engine. The search results are supplied to the system of the invention and further processed and analyzed to provide a useful graphical representation of the results that allow a user to quickly visually comprehend the search results. With respect to information from the Internet, the invention recognizes that interconnectivity between various xe2x80x9chitsxe2x80x9d from a search is an important factor that should be considered in relevance ranking and should also be part of the visual representation of the search results. Once a graphical representation is presented to the user, the graphical representation may be navigated by the user to examine the information located from the search. In addition, the user may specify that the display space be reorganized by a different xe2x80x9ctheme.xe2x80x9d For example, a search for xe2x80x9crestaurantsxe2x80x9d could be visually organized within the display space according to themes of price, cuisine, geographical location or a different theme, the parameters of which are specified by the user.
According to a preferred embodiment of the invention, xe2x80x9chitsxe2x80x9d (i.e., matches from the search results) that are logically related to each other are grouped together in the display space, and each of the groups are xe2x80x9cmappedxe2x80x9d to predetermined areas of the display space according to the relevance of the groups as determined by the invention. For example, the most relevant group or cluster of information may be positioned in the central region of the display space, while groups and clusters of less relevance will be positioned farther away from the central region of the display space. In other words, the proximity to the central region of the display space is an indication of the relevance of the information. As will be apparent from the examples set forth herein, such a unique visual representation allows a great amount of information to be readily comprehended by a user.
To aid in navigation of the search results, the system preferably provides xe2x80x9cmouse overxe2x80x9d information (i.e., a temporary information box that provides some detail regarding the item over which the mouse is currently located) for each xe2x80x9chitxe2x80x9d displayed in the display space. If the user clicks on a particular xe2x80x9chitxe2x80x9d the search results display space is preferably redrawn in a microview and moved to a comer of the screen as the system displays in a main window the Web page relating to the hit that was selected. As the user browses through the Web page and perhaps follows other links, the microview representation of the search results display window continues to provide the user with information useful to assist with navigation. Once the user completes examination of the particular Web page, the search results display space may be redrawn in a macroview representation for the user to conduct further navigation.
According to an important aspect of the present invention, the search results display space is preferably provided with arrows that mark links between the xe2x80x9chitsxe2x80x9d representing search results. This feature of the invention provides the user with an efficient visual representation of the cross-connectivity between different hits from the search. For example, in the case of a search conducted upon Web pages of the Internet, some of the pages located and displayed as hits in the search results display space may, in the content of the Web pages, contains information specifying the addresses, location or links to other pages which are also displayed as hits in the search results display space. Such pages that contain links to other pages will be displayed with an arrow positioned between the representation of the two pages and indicating the direction of the link. In some searches, a page that serves primarily as a reference, i.e., a page of links to other pages, will immediately be recognized by a user because the page representation will include many arrows emanating from it toward other page representations. On the other hand, a page that contains a great deal of content on the subject matter of the search will be immediately be recognized by a user because it will probably include many arrows pointing to it from other pages. Through this unique and beneficial graphical representation, a user can quickly comprehend the size of the search results, the likely content of various search results, and can readily navigate the search results to efficiently find the information desired. In cases where a user is searching a relatively unfamiliar topic he may conduct an initial search, find a reference page (by visually recognizing the page with many arrows emanating from it), and begin navigation from that reference page. After navigating through such a reference page, the user will appreciate the information available on the search subject and can then return to the search results display space to conduct further navigation on particular topics of interest.
Another important aspect of the present invention is that the relevance ranking is preferably based upon a variety of information, instead of merely the extent that particular search terms correspond to information within a particular hit. The present invention may further consider as factors relating to relevancy: (1) the cross-connectivity of a particular hit (i.e., the number of links to/from the hit), (2) the number of times a search engine generated the node as a hit, (3) the relevancy rating for the hit as returned by the search engine, (4) the type of Web site where the hit is found (i.e., xe2x80x9c.eduxe2x80x9d for an education related site, xe2x80x9c.comxe2x80x9d for a commercial site, xe2x80x9c.orgxe2x80x9d for an organization, xe2x80x9c.govxe2x80x9d for a government site, etc.), and (5) in an embodiment where multiple search engines are used, the number of search engines reporting the node as a hit.