1. Field of the Invention
The present invention relates to a distributed searching system in general, and, more particularly, the present invention relates to a distributing searching system for locating information resources in a large scale network of connected computers having respective available information resources.
2. Description of Related Art
Within the basic intangible resources of the information industry, information and service provided by computers is generally referred to as "information resources".
As a result of recent progress that has been made in network services, including the ability to connect an enormous number of computers and offering a variety of services, it has become difficult to ascertain the type of information resources possessed by each of the respective computers in a network.
Moreover, even if such information can be ascertained, since the network environment periodically changes due to maintenance and defects in the computers or the networks, information resources which have been previously used may not always be available. Therefore, it is necessary for users to ascertain which computers provide desired information resources using the most recent information available at the time the user actually uses the information resources.
In addition, many computers generally have the same information resources, and therefore it is natural that information resource quality, such as freshness, accuracy and degree of abstraction, etc. are different in each computer, depending on management policies of a computer manager. Therefore, it is preferable that users be able to determine which of the many computers have the best information resources.
However, information resources qualities cannot be identified until the information resources are actually used and compared with other information resources. To do so, however, requires a large amount of both time and labor, and is especially difficult for beginners who have poor knowledge of information resources. Therefore, the most effective method for identifying the best information resources assumes that those information resources which are recognized by many users as being reliable information resources the most effective.
In recent years, many network information resources can be accessed by the World Wide Web ("WWW"). Positional information of information resources can be expressed in the WWW by a Uniform Resource Locator ("URL"). When a user wants to utilize an information resource, the user must know the URL corresponding to the information resource. However, only a small number of URLs from among all information resources on the network are known by any single user. Therefore, as a method of searching for a URL corresponding to an information resource, a searching service, commonly referred to as a "search engine", is provided on the WWW.
The method executed in the searching service can basically be divided into two steps. The first step involves collecting information regarding information resources which are available through the network, and the second step involves administrating and providing the collected information for users. The information collection method is roughly classified into two kinds of systems, namely a directory service system and a robot system.
In a directory service system, an information resource providing side requests registration to a directory from a manager, or provider of an information resource that offers the search engine service, or directory service. Many search engines, such as Yahoo (http://www.yahoo.com/) and AltaVista (http://altavista.digital.com/), are examples of the directory service system. Since the information resource provider can reliably issue the registration request, information quality tends to be high. Nevertheless, a disadvantage of the directory service system is that the registration requests are often executed by a manager using a manual process, which results in overburdening the process load shared by the manager. Moreover, as a result of the substantial load, information cannot be updated quickly and accurately.
In the robot system, in order to automatically search existing URLs to establish the database of a URL, a trace is sequentially made for a link, or anchor in the Hyper-Text Mark-up Language ("HTML") documents using a program called a robot. HTML is a standard language that describes the information provided by the WWW. Examples of a robot system include WWW Worm (Colorado Univ., O. A. McBryan) and RBSE Spider (Houston Univ. D. Eichmann). However, only when a information resource provider informs someone of service of the information resource and the link to it has been extended by him, the information is registered to the database of information resources update of information resource and service must be made, reference is made to the information resource while the information resource provider is unaware of the update. Moreover, since information resources are searched mechanically, non-useful information resources may easily be picked up, generating a useless load on the network and computers.
Next, a method of administrating positional information of the collected information resources and providing such information resources to users is described as follows.
In a centralized management system, all data is served with a single server. The centralized management system is used in many search engines, including Yahoo and AltaVista. An advantage of the centralized management system is that maintenance is easily performed because there is only one administration. On the other hand, server load quickly becomes very large since access by users is concentrated to a single server. Moreover, the r centralized management system also has the disadvantage of high communication costs, which result for some users, making the service burdensome. Furthermore, if the server fails, the centralized management system can no longer offer the service.
In a distributed management system, data is administrated and served in common with other servers. This system can be classified as follows, depending on the procedure for sharing.
Each user in a distributed management system uses a server by selecting a most accessible server to distribute the load. Mirroring is an example of this system. An advantage of the distributed management system is that since many of the servers have the same functions, service can be continued even if a particular server fails. However, a user cannot benefit from this advantage if he cannot detect positional information of the alternative server to continue the same service. In addition, in the distributed management system, data management costs are high since all servers must hold the same data.
In a distribution of service system, service is classified into several categories, with each category being covered by respective servers. Domain Name Service ("DNS"), which makes reference to an IP 12 "Internet Protocol" IP address from the name of the computer, is an example of the distribution of service system. Wide Area Information Service ("WAIS") Pre is a large scale distributed database that can also be placed into this category. Moreover, the % distribution of service system is compatible with the distribution of access. In this system, since the server to be administrated is different depending on the kind of service, maintenance can be easily performed. However, when the kind and range of service is restricted, the distribution of service system becomes similar to the centralized management, and therefore the disadvantages of the centralized management system can be seen.
A user must change the server to be used depending on the desired service, and therefore, it is inconvenient when the user is unable to determine the server required from the service. This is not a problem for DNS because the server can automatically be searched by utilizing the hierarchical configuration of domain.
On the other hand, a technique of an information resource recommending function, known as social filtering or collaborative filtering has been developed in which a preferable information resource is recommended based on aecommendation by another person, or an evaluation value and action of other persons having the same preference. For example, Tapestry (Xerox Palo Alto Research Center, D. Goldberg, D. Terry) is a system that aids in selective reading of articles recommended by others from among numerous articles from Usenet News and a mailing list. In the same way, examples of similar former systems in which other users designate an evaluation value for articles and recommended articles having a greater value, include GroupLens (minesota Univ., J. Riedl, J. Konstan), which is a system for recommending Usenet News articles and Ringo (MIT, P. Maes, U. Shardanand), which is a system for recommending music albums.
But, since it is not guaranteed that the favorite of one field is similar even if a favorite of another field is similar, it is not always best to follow the action and recommendation of a particular person. In addition, since information about a favorite is centralized for management, the problem described in regard to the centralized management of the information of the search engine may be made apparent in regard to the management of favorite data.
As described above, problems that exist in the related art can be classified as follows.
In the directory service system, execution of registration requests often depends on manual operation by a manager, and therefore the manager tends to be heavily overloaded. As a result, it is likely that a search will be unsuccessful due to a mistake by the manager.
Ineffective HTML documents may be transferred since the robot program does not fully evaluate the contents of HTML documents, and, as a result, the load of traffic and load on the server tend to increase.
In order to keep the traffic low, frequency of activation of the robot program must be reduced. As a result, information collected is often immediately changed there is an increased possibility that the information obtained may already be invalidated.
Contents of information collected by a robot are probably not immediately reflected in a search result on an information resource provider side, even when it is not known to whom notification of the change of contents of service offered must be made, and such change of contents can be informed.
When a database of an information resource becomes large, a large number of results are output for the search. Therefore, a user cannot determine which information resource is most adequate. When a user does not have sufficient knowledge about the object information, such a determination becomes very difficult.
Since there is no guarantee that all favorites are similar, even if a particular favorite is similar, recommendation by a particular person is not always satisfactory.