Conventionally, client-server model software having a distributed lock function, such as a database system and a distributed file system (hereinafter, these will be collectively referred to as a distributed file system) is known. The distributed lock function is a function for controlling an access to a shared resource from a client service (hereinafter, simply referred to as a client) and, for example, is realized by a resource exclusive management subsystem such as a distributed lock manager.
FIG. 15 is a diagram that illustrates an example of a distributed lock operation, and FIG. 16 is a diagram that illustrates an example of a distributed lock operation at the time of releasing a client. In FIG. 16, a process to which the same reference sign as that illustrated in FIG. 15 is attached is the same as the process illustrated in FIG. 15, and thus, duplicated description thereof will not be presented.
In a distributed file system such as Lustre, as illustrated in FIG. 15, when a distributed lock request is received from client A in Process T110, a server service (hereinafter, simply referred to as a server) assigns a distributed lock to client A in Process T120. The client A to which the distributed lock has been assigned, for example, performs a write process for a lock range.
When a distributed lock request is received from client B in Process T130, and a collision between distributed locks of the clients A and B is detected, the server requests the client A for a distributed lock (or the return thereof) in Process T140. However, since the client A is in the middle of using the distributed lock, the client A denies return in a reply to the distributed lock request in Process T150 and returns the distributed lock to the server after the completion of the write process in Process T160.
The server assigns the returned distributed lock to client B in Process T170. The client B waiting for the assignment of the distributed lock from Process T130, for example, performs a write process in a lock range of the assigned distributed lock. Since the distributed lock request is not received from the server, the client B may omit returning the distributed lock even after the completion of the write process.
As above, the resource exclusive management subsystem of the distributed file system arbitrarily assigns an appropriate distributed lock to a processing subject, and only a client or a server to which the distributed lock is assigned can operate the resource (see “MOVEMENT OF DISTRIBUTED LOCK” in FIG. 15). Accordingly, it can be prevented that a plurality of clients issue write system calls for the same file at the same time, and each of the clients arbitrarily performs file writing. Therefore, the occurrence of a significant failure such as data destruction or data loss due to simultaneous writing for same data or file system destruction due to inconsistency of management information can be prevented.
Here, as illustrated in FIG. 16, in a case (see Process T250) where the behavior of the client is not appropriate, for example, as in a case where there is no reply (response) from the client or the like, the server releases the client (release process) (see Process T260). In the release process, the server cuts off (releases) the connection with the client by removing connection information (server-side connection information) relating to the connection with the client, the lock range, and the like from the server. Accordingly, in the distributed file system, the consistency of the entire system can be maintained.
The release process of the client is a server-initiative process and is asynchronously performed for the client. Even in a case where the server has a notification/synchronization function for the client, the release process of the client is a process with a case where the client does not respond to the server being also considered, and thus the notification to the client is not necessarily assured to be successful. In other words, the synchronization at the time of releasing the client is not assured.
FIG. 17 is a diagram that illustrates an example of a client releasing process and a release restoring process.
For example, as described above, since the release process is asynchronously performed, a client that has been released (see Process T310 illustrated in FIG. 17) on the server side is difficult to recognize the release thereof on the server side. Accordingly, there is a case where the client that has been released transmits a request to the server based on the connection information (client side) that is inconsistent with the server-side connection information that has been removed in Process T320. When the server that has received the request checks that the connection information is not present, the server returns an error to the request in Process T330. At this time, the client recognizes that the client is released on the server side, and synchronization relating to the release process of the client between the server and the client is completed.
The client that has received the error discards the client-side connection information that is in the inconsistent state and transmits a reconnection request to the server side in Process T340. Then, the client updates the connection information through a reconnection established in accordance with the reconnection request and builds the connection information that is consistent with the server, whereby the release restoring process performed by the client is completed by the client in Process T350. In other words, the release restoring process performed by the client is triggered upon the notification of an error from the server for the request that is transmitted first from the client to the server.
Here, the above-described error notified from the server is an error that is caused by inconsistency between the client-side connection information and the server-side connection information. The error may be regarded as an error (significant error) that implies a possibility of bringing a significant error such as data destruction, data loss, or file system destruction due to inconsistency of management information described above.
FIG. 18 is a diagram that illustrates an example of a rewrite process at the time of the occurrence of an error in a client, and FIG. 19 is a diagram that illustrates an example of the influence of the release of a client on an application. As illustrated in FIG. 18, when a significant error is received, the client does not perform a rewrite process but returns an error up to a process that is the original source of the issuance of the request causing the significant error. In addition, as illustrated in FIG. 18, in the case of a normal error, the client does not return the error up to the processing source but returns the process inside the system so as to perform rewrite as possibly as can.
For example, in Lustre or the like, in a case where the processing source of the request causing the significant error is a system call issued by a user application (see Process T430 illustrated in FIG. 19), the system call is returned with an error. In other words, the significant error is returned to the user application in Process T440.
While the significant error is received, and the release restoring process is started on the client side, even in the case of a process not issuing a request that directly causes the release restoring process, the same significant error is returned to the process referring to the old connection information before the release restoring process. In a case where the process is a process originated from a user application, a significant error is returned to the user application as well.
In other words, when the release of the client is performed, the possibility of returning an error to the user application increases.
A user application is an application used for performing a process desired by the user, and, in many cases, it is not considered to correctly process an error. Accordingly, even after the significant error is received, in many cases, the user application is not configured to retry the process and not correctly perform the error process. In a case where the execution of the application is automated or the like, the user does not temporarily recognize the end of the user application with an error, and the occurrence and the handling of the significant error may be frequency an obstacle to the operation of the system.
Thus, in the distributed file system such as Lustre, as illustrated in FIG. 20, there are cases where the client transmits a ping request to the server.
FIG. 20 illustrates an example of a release detection technique according to the ping. As illustrated in FIG. 20, the client transmits a ping request to the server on a regular basis (e.g., at the interval of 25 seconds) in Processes T510 and T520. Accordingly, the client can perform the release restoring process in Process T530 by being triggered upon the ping request resulting in an error in Process T520, and it can be prevented that the client that has been released is present over a long period on the server side. The possibility that the request transmitted by being triggered upon a user application in Process T540 causes the release restoring process (the possibility of generating a significant error) can be reduced.
In this method, the ping request is made completely asynchronous with the client state and the server state on a regular basis. Accordingly, from the viewpoint of the release of the client, the client-side connection information and the server-side connection information can be synchronized with each other for every transmission interval of the ping request.
As a related technology, a technology for responding to a request, which is started by a user, for requesting an access to specific regional information from a remote place is known (e.g., see Japanese National Publication of International Patent Application No. 2003-521765). According to this technology, a connection is made to the Internet, an Internet protocol (IP) address that is dynamically allocated is received, the IP address is transmitted, and the connection is released when a maximal unused time is exceeded.
Furthermore, as another related technology, a cache storage device is known which issues a release request for releasing a locked area inside a storage device to the storage device in a case where there is no request from a client for a predetermined time (e.g., see Japanese Laid-open Patent Publication No. 2004-342071).
As described above, the release process of a client is performed asynchronously with the client in a server-initiated manner. In other words, the client actually transmits a request to the server, and, until an error caused by the release of the client is received from the server for the request, the client does not recognize whether the client has been released, and it is difficult for the client to perform the release restoring process. Accordingly, in a case where the transmission source of the request causing the error is a user application, there is a problem that the error is returned to the user application.
In the technique illustrated in FIG. 20, the client transmits a ping request to the server on a regular basis, and by detecting the ping request to be erroneous due to the release of the client, the release restoring process is performed. However, as the system scale increases, and the number of servers and the number of clients increase, the load of central processing units (CPUs) of the servers and the clients, the amount of memory usage, the network load, and the like due to the regular transmission of ping requests become huge.
In order to reduce the load, the ping request may be considered to be stopped. However, in a case where a request that is transmitted first after the release process causes an error and the transmission source of the request is a user application, the error is returned to the user application.
Further, in the related technologies described above, the above-described problem is not considered.
Here, while the distributed file system such as Lustre has been described as an example, the above-described problem may occur in various information processing systems in which a release process (release of a connection) of a terminal device that is performed by an information processing apparatus is performed asynchronously with the terminal device.