1. Field of the Invention
The present invention relates to a distributed searching system in general, and, more particularly, the present invention relates to a distributing searching system for locating information resources in a large scale network of connected computers having respective available information resources.
2. Description of Related Art
Within the basic intangible resources of the information industry, information and service provided by computers is generally referred to as xe2x80x9cinformation resourcesxe2x80x9d.
As a result of recent progress that has been made in network services, including the ability to connect an enormous number of computers and offering a variety of services, it has become difficult to ascertain the type of information resources possessed by each of the respective computers in a network.
Moreover, even if such information can be ascertained, since the network environment periodically changes due to maintenance and defects in the computers or the networks, information resources which have been previously used may not always be available. Therefore, it is necessary for users to ascertain which computers provide desired information resources using the most recent information available at the time the user actually uses the information resources.
In addition, many computers generally have the same information resources, and therefore it is natural that information resource quality, such as freshness, accuracy and degree of abstraction, etc. are different in each computer, depending on management policies of a computer manager. Therefore, it is preferable that users be able to determine which of the many computers have the best information resources.
However, information resources qualities cannot be identified until the information resources are actually used and compared with other information resources. To do so, however, requires a large amount of both time and labor, and is especially difficult for beginners who have poor knowledge of information resources. Therefore, the most effective method for identifying the best information resources assumes that those information resources which are recognized by many users as being reliable information resources the most effective.
In recent years, many network information resources can be accessed by the World Wide Web (xe2x80x9cWWWxe2x80x9d). Positional information of information resources can be expressed in the WWW by a Uniform Resource Locator (xe2x80x9cURLxe2x80x9d). When a user wants to utilize an information resource, the user must know the URL corresponding to the information resource. However, only a small number of URLs from among all information resources on the network are known by any single user. Therefore, as a method of searching for a URL corresponding to an information resource, a searching service, commonly referred to as a xe2x80x9csearch enginexe2x80x9d, is provided on the WWW.
The method executed in the searching service can basically be divided into two steps. The first step involves collecting information regarding information resources which are available through the network, and the second step involves administrating and providing the collected information for users. The information collection method is roughly classified into two kinds of systems, namely a directory service system and a robot system.
In a directory service system, an information resource providing side requests registration to a directory from a manager, or provider of an information resource that offers the search engine service, or directory service. Many search engines, such as Yahoo (http://www.yahoo.com/) and AltaVista (http://altavista.digital.com/), are examples of the directory service system. Since the information resource provider can reliably issue the registration request, information quality tends to be high. Nevertheless, a disadvantage of the directory service system is that the registration requests are often executed by a manager using a manual process, which results in overburdening the process load shared by the manager. Moreover, as a result of the substantial load, information cannot be updated quickly and accurately.
In the robot system, in order to automatically search existing URLs to establish the database of a URL, an inverse trace is sequentially made for a link, or anchor in the Hyper-Text Mark-up Language (a xe2x80x9cHTMLxe2x80x9d) documents using a program called a robot. HTML is a standard language that describes the information provided by the WWW. Examples of a robot system include WWW Worm (Colorado Univ., O. A. McBryan) and RBSE Spider (Houston Univ., D. Eichmann). However, the information is registered to the database of information resources only when an information resource provider informs someone of the service of the information resource and a link to it is established by him. Moreover, in a robot system, it is unclear to whom notification of the update of information resource and service must be made, reference is made to the information resource while the information resource provider is unaware of the update. Moreover, since information resources are searched mechanically, non-useful information resources may easily be picked up, generating a useless load on the network and computers.
Next, a method of administrating positional information of the collected information resources and providing such information resources to users is described as follows.
In a centralized management system, all data is served with a single server. The centralized management system is used in many search engines, including Yahoo and AltaVista. An advantage of the centralized management system is that maintenance is easily performed because there is only one administration. On the other hand, server load quickly becomes very large since access by users is concentrated to a single server. Moreover, the centralized management system also has the disadvantage of high communication costs, which result for some users, making the service burdensome. Furthermore, if the server fails, the centralized management system can no longer offer the service.
In a distributed management system, data is administrated and served in common with other servers. This system can be classified as follows, depending on the procedure for sharing.
Each user in a distributed management system uses a server by selecting a most accessible server to distribute the load. Mirroring is an example of this system. An advantage of the distributed management system is that since many of the servers have the same functions, service can be continued even if a particular server fails. However, a user cannot benefit from this advantage if he cannot detect positional information of the alternative server to continue the same service. In addition, in the distributed management system, data management costs are high since all servers must hold the same data.
In a distribution of service system, service is classified into several categories, with each category being covered by respective servers. Domain Name Service (xe2x80x9cDNSxe2x80x9d), which makes reference to an information provider (xe2x80x9cIPxe2x80x9d) address from the name of the computer, is an example of the distribution of service system. Wide Area Information Service (xe2x80x9cWAISxe2x80x9d) is a large scale distributed database that can also be placed into this category. Moreover, the distribution of service system is compatible with the distribution of access. In this system, since the server to be administrated is different depending on the kind of service, maintenance can be easily performed. However, when the kind and range of service is restricted, the distribution of service system becomes similar to the centralized management, and therefore the disadvantages of the centralized management system can be seen.
A user must change the server to be used depending on the desired service, and therefore, it is inconvenient when the user is unable to determine the server required from the service. This is not a problem for DNS because the server can automatically be searched by utilizing the hierarchical configuration of domain.
On the other hand, a technique of an information resource recommending function, known as social filtering or collaborative filtering, has been developed in which a preferable information resource is recommended based on a recommendation by another person, or an evaluation value and action of other persons having the same preference. For example, Tapestry (Xerox Palo Alto Research Center, D. Goldberg, D. Terry) is a system that aids in selective reading of articles recommended by others from among numerous articles from Usenet News and a mailing list. In the same way, examples of similar former systems in which other users designate an evaluation value for articles and recommended articles having a greater value, include GroupLens (Minnesota Univ., J. Riedl, J. Kenstan), which is a system for recommending Usenet News and Ringo (MIT, P. Maes, U. Shardanand), which is a system for recommending music albums.
But, since it is not guaranteed that the favorite of one field is similar even if a favorite of another field is similar, it is not always best to follow the action and recommendation of a particular person. In addition, since information about a favorite is centralized for management, the problem described in regard to the centralized management of the information of the search engine may be made apparent in regard to the management of favorite data.
As described above, problems that exist in the related art can be classified as follows.
In the directory service system, execution of registration requests often depends on manual operation by a manager, and therefore the manager tends to be heavily overloaded. As a result, it is likely that a search will be unsuccessful due to a mistake by the manager.
Ineffective HTML documents may be transferred since the robot program does not fully evaluate the contents of HTML documents, and, as a result, the load of traffic and load on the server tend to increase.
In order to keep the traffic low, frequency of activation of the robot program must be reduced. As a result, information collected is often immediately changed there is an increased possibility that the information obtained may already be invalidated.
Contents of information collected by a robot are probably not immediately reflected in a search result on an information resource provider side, even when it is not known to whom notification of the change of contents of service offered must be made, and such change of contents can be informed.
When a database of an information resource becomes large, a large number of results are output for the search. Therefore, a user cannot determine which information resource is most adequate. When a user does not have sufficient knowledge about the object information, such a determination becomes very difficult.
Since there is no guarantee that all favorites are similar, even if a particular favorite is similar, recommendation by a particular person is not always satisfactory.
It is therefore an object of the present invention an information resource which has solved the problems explained above by preventing common-placing of information and selecting the best information resource being the best through simultaneous advertisement of information resource using automated management of information regarding information resources, and returning the search result to a user via the advertisement of information resource by the information resource provider and inquiry from a user.
Objects of the present invention are achieved by a distributed search system connecting a plurality of computers on a network. Each of the plurality of computers includes a device for storing advertisement information including position information of an information resource, and a device for searching the storing device of each of the plurality of computers in response to a search request. In addition, each computer includes a device for accepting a request to register advertisement information in the storing device, and a device for transferring the advertisement information requested to be registered to the plurality of computers. The transferring of the advertisement information is determined by cost information given to the advertisement information.
Further objects of the present invention are achieved by a distributed search system connecting a plurality of computers on a network. Each of the plurality of computers includes a device for storing advertisement information including position information of an information resource, and a device for searching the storing device in response to a search request. The search request is accepted from searching device of the plurality of computers, and a search range corresponding to the search request is determined by cost information of the search request.
Further objects of the present invention are achieved by a plurality of searching apparatus having corresponding information resources, that connected to a network that includes an information resource provider. Each of the plurality of searching apparatus includes a storage device to store advertisement information that includes position information corresponding to the information resources, an advertisement processing device to accept registration of the advertisement information from the information resource provider, and a control device to store the advertisement information accepted for registration by the advertisement processing device in the storage device. The control device also transfers the advertisement information accepted for registration to the plurality of searching apparatus, and stores and transfers advertisement information transferred from the plurality of searching apparatus.
Still further objects of the present invention are achieved by a searching apparatus from among a plurality of searching apparatus having information resources, that processes a search request from an information resource searcher. The searching apparatus includes a storage device to store advertisement information that includes position information of the information resources, an interface device to provide a search result, having corresponding advertisement information, in response to the search request, and a control device to search the storage device in response to the search request. The control device also transfers the search request to the plurality of searching apparatus, searches the storage device in response to a search request transferred from one of the plurality of searching apparatus and transfers the transferred search request to the plurality of searching apparatus other than the one searching apparatus, and transfers resulting advertisement information to the information resource searcher or the one of the plurality of searching apparatus.