The present invention relates to a computer processing system, particularly to a system managing method in structuring a system in which a plurality of computers connected by a network distribute, share and change information (or an information processing system), and more particularly to a distributed server managing method which is suitable for world-wide web (WWW) and a distributed information processing system which uses such a method.
The existing information system is operated in such a manner that a plurality of computers connected to a network provide or acquire information. Hardware and software of a computer for mainly providing information are called a server, and hardware and software of a computer for mainly acquiring information are called a client.
In the conventionally used information processing system, in the case where there are a plurality of servers, the servers are not in cooperation with each other at all (for example, even if there are first and second servers, any one of these servers does not know the existence of the other or does not depend on the other) or are in strong cooperation with each other (for example, first and second servers exchange or cache data).
In the case where during the operation of a first server a second server is newly established or started (generically called the addition of server) or in the case where the second server is temporarily stopped or has a trouble inclusive of both the trouble of the server itself and the trouble of a communication line to the server (generically called the deletion of server), the conventional requirement for cooperation of the first and second servers with each other is the proper setting of the first and second servers by an administrator (or administrative operator).
In the case where the number of servers is very small, the above requirement offers no special problem. However, in the case where the number of servers becomes large, a large load is imposed on the administrator when a server is to be newly added or to be deleted.
With the explosive growth of the Internet and the WWW in recent years, there has been generated the situation in which the number of servers is very large. Techniques used in the WWW will first be described.
In the WWW, a WWW server holds information in units called "home pages". Each home page has a name called URL (which is the abbreviation of Uniform Resource Locator). When the URL of a home page to be accessed is designated to a WWW client by the user of the WWW client, a first processing configuration in the WWW includes requesting a WWW server designated by the corresponding URL for the transmission of a home page corresponding to the URL. In response to this request, the designated WWW server provides the corresponding home page to the WWW client.
In a second processing configuration, the WWW client does not request the WWW server designated by the URL given by the user and thereinstead requests another kind of server called "proxy server" (or simply called "proxy"). The proxy acquires a home page corresponding to the URL from the WWW server or transfers the URL to another proxy. In the case where the request to proxy includes a plurality of steps, the proxies have a parent-child relationship. The proxies having the parent-child relationship have been disclosed by, for example, A. Chankhunthod et al, "A Hierarchical Internet Object Cache", 1996 USENIX Technical Conference, pp. 153 to 163, 1996 (hereinafter referred to as Reference 1). The WWW proxy can have a cache shared between a plurality of clients. In the recent Internet, therefore, the second processing configuration embracing many proxies is widely used. Which of the proxies does each proxy transfer the request to is set by an administrator.
Some information processing systems as well as the WWW are widely used. In any system, however, the cooperation of a plurality of servers with each other is limited to a fixed relationship set by an administrator.
A network news system disclosed by, for example, B. Kantor et al, "Network News Transfer Protocol: A Proposed Standard for the Stream-Based Transmission of News", Network Working Group RFC-977 (hereinafter referred to as Reference 2) is another example of the information system and is constructed by a plurality of servers in a manner similar to that in the WWW.
The copies of "news articles" which clients subscribe for and contribute are distributed by the plurality of servers in the network news system to each other. The administrator of each server sets another server for which the transfer of news articles is to be made. In the case where a new server is added, it is necessary to manually change the setting of the existing servers.
Further, DNS domain name service disclosed by, for example, P. Mockapetris, "Domain Names-Implementation and Specification", Network Working Group RFC-1035, especially the Second Chapter (hereinafter referred to as Reference 3) is a further example of the information system and is constructed by a plurality of servers.
The DNS makes a host name of the Internet and accompanying information of that host name (IP address and mail server) correspond to each other. In order to hold this corresponding information, a plurality of DNS servers form a tree structure. A request from a client is processed in such a manner that it is transferred between the plurality of servers with the trace of the tree structure.
The administrator of each DNS server sets which another DNS server is a request to be transferred to. In the case where a certain DNS server is replaced, the setting of DNS servers adjacent thereto requires to be manually changed. Also, each node of the above-mentioned tree structure takes a redundant construction including two or more DNS servers. In the case where those DNS servers have troubles (or in the case where network paths to those DNS servers have troubles), it is also necessary to manually change the setting of adjacent DNS servers.
Also, in a distributed file system of a local area network (LAN), there is known a method called "cooperative caching" in which a space for caches is shared by a plurality of computers. For example, in Michael Dahlin et al, "Cooperative Caching: Using Remote Client Memory to ImproveFile System Performance", First USENIX Symposium on Operating Systems Design and Implementation, pp. 267-280, 1994 (hereinafter referred to as Reference 4), a server called "manager" holds information of which file server is which file block stored in. When the manager is requested by a client for the transfer of a file block, the manager responds to the client about a computer in which that file block is stored or the manager transfers the client's request to the corresponding file server. Though there can exist a plurality of managers, the correspondence between the managers and file blocks is set beforehand. In order to change this correspondence, it is necessary for an administrator to manually change the setting.
In the WWW, rapid growth is being continued. For example, certain statistics showed the increase of 2.8 times in the number of servers in a half-year period from June of 1996 to January of 1997. Therefore, the whole of WWW has an explosively increasing amount of information. Though the proxy acts as a cache shared by a plurality of clients to reduce the response time for users (or a time from the issuance of a request by a user for information to the arrival of that information at the user's hand), an effect obtained in the existing circumstances is low since the capacity of the cache is small as compared with the amount of information in the whole of WWW.
A first problem to be solved by the present invention is to reduce a time/labor taken by an administrator when a plurality of servers are caused to cooperate with each other as in the WWW proxies. Thus, there is solved the problem that a long time and a large number of persons are required for management in realizing a large-scale cache embracing many proxies. Particularly, the first problem is such that in the case where a first server is to be added or deleted, existing second servers are informed of the existence (or non-existence) of the first server without the setting of the existing second servers by the administrator and the first server is informed of the existence (or non-existence) of the second servers.
Since the WWW operates on the Internet with an enormous number of machines connected and with a multiplicity of management organizations, a method using broadcast or a method using the centralization of all settings upon a single server is not practical for the solution of the first problem. Also, in the case where the number of second servers becomes large, even the cooperation of the first server with all the second servers becomes unpractical. Thus, in the case where the number of servers becomes very large, a contrivance for limiting the number of servers to be subjected to cooperation is required.
In the case where the first problem is solved, there is required a method in which in order to operate caches existing distributed to a plurality of servers, these servers effectively exchange their cache lists therebetween so that information absent in the cache of one server is inquired of another server. The realization of this method is a second problem to be solved by the present invention.
The solution of the first and second problems will result in the realization of a large-scale cache which extends over a plurality of servers. However, it has been found out that though the communication protocol HTTP used in the WWW requires an operation of confirming whether or not a cache is holding sufficiently fresh information (hereinafter referred to as validation operation), this validation operation deteriorates the response time for users. Therefore, a third problem to be solved by the present invention is to prevent the validation operation from deteriorating the response time for users.