Known methods of locating resources on a network and storing resource information in a searchable database are able to find resources whose text is related to a search string submitted by a user. In one known search methodology, the text of a resource is related to a search string if the text contains at least part of the search string. In more sophisticated search methodologies, the resource text is related to a search string if the text includes strings that are linguistically related to the search string.
In large network having many resources, a search string (or "keyword") search is likely to retrieve references to hundreds of even thousands of resources by known search techniques. For example, the Internet provides search engines (software programs that find and store index information for Internet resources that is searchable using a search string) that return every resource in the engine's database that is deemed to be appropriately related to the search string under the employed search methodology. This generally returns information on many more resources than the user can possible browse, and no information on which listed resources are the most valuable (e.g., most popular, acclaimed, etc.), leaving the user to wade through hundreds of resources. In essence, the signal-to-noise ratio for this kind of resource search is low.
An example of such a search engine on the World Wide Web is called Lycos, found at &lt;http://www.lycos.com/&gt;. In response to the search string "bob dylan", Lycos returned a list of over 29,000 "relevant" resources 101, the first page of which is shown in FIG. 1. The results are supposedly ranked from most relevant to least relevant, with percentage ratings 102 provided for each resource. Relevancy is determined by the textual similarity of a resource to the search string. In a simple case, the resources are ordered by the number of times that the search string appears in the text. In the example shown in FIG. 1, each resource has a resource title 103 and a somewhat cryptic description 104 evidently derived from the text of the resource itself. The value of each description can be limited. For example, the description for the first, purportedly most relevant resource provides a date 105 with no information as to what the date 105 refers, the last modification date 106, content type 107, length 108, and other information that is normally of little value to a user in deciding if the resource is responsive to her needs. The URL of the resource 109 is provided along with the resource size 110. Information for other resources listed on the page follows the same format, and is about as useful for determining if a resource is worthwhile. The quality of these resources is not addressed by the search engine. The user is hence not much better off than before she submitted her search. Lycos has provided the user with about 29,000 leads with little to distinguish the most useful, highest quality, or widely recognized resources from those that are of limited or no usefulness to the user.
Other methods of locating resources on a network include ARCHIE, a program that resides on a network server that provides searchable indexes of resource directory information; GOPHER, a network server program that provides searchable menu-based access to network resources; VERONICA, a network server program that provides searchable indexes of GOPHER menus from a plurality of server; Wide Area Information Services (WAIS), a distributed text searching system that examines indexes of network resources. Each of these search techniques provide information on network resources without systematically including an evaluation of any such resource.
Often faced with an overwhelming amount of information returned by known resource location techniques in response to a user query, the user is frequently unable to locate the resources that are the most responsive to her needs. As presently implemented, known methods of separating more relevant from less relevant resources are imperfect and sometimes ineffective or laborious for the user. For example, presenting resources that include the highest number of occurrences of the user's search string (e.g., that have the highest number of "hits") often misses the target. This is because such a metric for relevancy fails to take into account the context in which the search string appears in the resource. Thus, the search string "snake" may return resources concerning reptiles, rivers, plumbing devices, and resources in which the term "snake" is used frequently as a verb. A better approach would provide data useful for assessing the character and value of the information provided by a resource.
A known method for providing contextual information associated with network resources is implemented in certain search engines, such as Yahoo at &lt;http://www.yahoo.com&gt;. As shown in FIG. 2, Yahoo presents network resources under predetermined categories, such as Arts and Humanities 21, Science 22, etc. These categories are generated by human intervention, and human input is required to provide the contextual information provided by the categorization and information on individual resources. The contextual information on individual resources is generally provided by users (often the originator of the resource), who register the resource with the engine.
While engines such as Yahoo provide some contextual information for resources, such contextual information is often biased because it is commonly provided by the originator of the resource. Further, the vast majority of the resources searchable through the engine are registered by their originators, who also select the categories under which the resource appears. This results in inaccuracies and inconsistencies, as resources providing similar information are registered under different categories. Some of these problems have been addressed by hiring professional librarians and subject matter experts to intervene and provide context for the stored collection of resource information. However, reliance upon paid human intervention in this fashion is expensive, tedious, and slow. The resources on a large network such as the Internet grow and change at such a rapid pace that human entry of resource information is an inadequate means for capturing the full range of what is available.
A searchable database comprising automatically gathered and analyzed information on resources that have been evaluated would provide an efficient and effective means of locating a wide range of resources that have been recognized as valuable by and for users. Although human evaluations are necessarily subjective, the credibility of an evaluation is enhanced when concurring evaluations are made independently by more and more people. An important step in building such a database would involve locating and exploiting a body of resource evaluation data that is substantial and broad enough to provide credible evaluations of a wide range of network resources, and that is inexpensively available.
Such resource evaluation data would be even more valuable if it included thematic data, or data from which thematic information pertaining to the evaluation could be derived. Such data would provide a richer, more useful way to present resource information responsive to a user request. By grouping resource information thematically, the user is provided the opportunity to search by theme, which can be more effective for certain searches than traditional search methods (e.g., keyword searches). This is particularly true for searches seeking general information on a given topic. Thematic information advantageously provides a contextual framework that makes it easier for the user to locate and examine the resources that are the most pertinent to user's needs.
Electronic messages are sent and received in substantial numbers in large networks. The subject matter of such messages are as diverse as the human concerns that motivate any person-to-person communications. One such concern is the evaluation and recommendation of network resources. The frequency of occurrence of messages evaluating a network resource in the overall volume of message traffic is small. Hence, a large number of messages would have to be efficiently examined to identify those which comprise evaluations.
An example of a large network that generates a large amount of electronic message traffic is the Internet. One of the services provided on the Internet (and on other networks) is USENET, an informal organization of servers that host newsgroups related to particular areas of interest. The topic of each newsgroup is indicated by its name. For example, newsgroups beginning with "rec" concern hobbies and other recreational activities. Increasing detail is provided by address segments to the right of the category. Thus, rec.music.folk provides a forum for users to post messages regarding folk music. The newsgroup topic appears in every message posted to the newsgroup, and provides thematic information for every message. The newsgroup functions as an electronic public bulletin board, on which users sequentially post messages visible to all on the topic of the group. Examples of other organizations that generate substantial electronic messages that would be useful sources of network resource evaluations include bionet for biologists; BITNET listservs, which distribute electronic user messages via e-mail; hepnet for high energy physics; and Clarinet. It is also common for large corporations to have both public and private netnews networks, on which messages of general or particular interest are posted. Such messages may or may not provide thematic information, depending upon the architecture of the particular netnews system.
Electronic messages comprise a substantial and continually renewed base of data that contains a small but significant percentage of network resource evaluations. Efficiently mining a substantial number of these messages for such evaluations would economically provide the raw material for generating a new kind of searchable index of network resources that could point a user towards resources that have been recognized and discussed by other users. Searching for and presenting evaluated sites to a user in response to a search request would be substantially more likely to provide resource information responsive to the user's needs than a simple keyword search of all network resources which returns resource information based upon the frequency of occurrence of the search term in the resource. Further, making the evaluations for resources available to the user would allow the user to make independent assessments of the likely quality and responsiveness of a given resource for her needs. The challenges in developing a system and method to carry this out would include obtaining a sufficiently large volume of messages to search such that a useful number of evaluations could be derived therefrom; distinguishing messages that are evaluations from messages that are not; and storing and presenting the evaluations and evaluated resource identifiers to the user in a way that the user can easily understand, and further use to obtain copies of evaluated resources.