Network attached storage (NAS) systems are widely used for sharing purpose in enterprises' distributed file systems due to its features such as ease of use, high efficiency, and ease of management. A typical networking environment of a NAS system is shown in FIG. 1.
In a NAS system, multiple read/write requests from different application hosts may be received for a same file. To avoid read/write collisions, when a read or write request for a file is received from an application host, a lock server in a node device needs to lock the file (lock permission) to prevent other concurrent and mutually exclusive accesses to the shared resource. When the read/write operation is complete, the file is released (unlocked). A correspondence between lock permission information and an application host may be stored in each node, or may be stored in a shared storage device. The shared storage device is independent of each node and can be accessed by each node. The shared storage device is not shown in FIG. 1.
Recently, as a virtualization technology develops, applications such as virtual desktop infrastructure (VDI), ORACLE database platforms, and structured query language (SQL) server database platforms begin to be installed in distributed systems, this imposes higher requirements on reliability of the distributed systems. When a fault occurs in a node device of a distributed system (e.g. a NAS system), the NAS system configures an Internet protocol (IP) address of the faulty node device on another node device in a manner of node device IP failover, so as to enhance reliability of the NAS system. The switchover is transparent to each application host, that is, each application host cannot perceive the IP failover between the node devices in the NAS system. This reduces impacts on applications in each application host.
Network file system (NFS) V3 is a protocol version that has been most widely used for the longest time so far. However, due to an imperfect definition of a lock in the protocol, the protocol relies on other auxiliary protocols such as network lock manager (NLM) and network status manager (NSM). Consequently, a lock restoration procedure in NFS V3 is complex.
As shown in FIG. 1, when a node device 1 is faulty, an IP address of the node device 1 is failed-over to a node device 2. That is, the IP address of the node device 1 is configured on the node device 2. The IP failover is transparent to an application host 1. The application host 1 does not know the change occurring between the node devices. In design schemes of some protocols, such as the NFS protocol and the server message block (SMB) protocol, for a high access efficiency of an application host, after an IP address of a faulty node device is failed-over to another node device, the application host may use a lock reclaim request to re-apply for lock permission of a file obtained by an application in the application host. In this way, a lock server in the distributed system needs to securely control a lock request, such as a lock reclaim request or a locking request; otherwise, data obtained by multiple application hosts may be inconsistent due to improper permission control, or even a problem of system breakdown is caused when multiple application hosts read/write data simultaneously.
Thus, when a node device in the distributed system is faulty, for example, when a lock server is faulty, all lock servers in the distributed system are silent, that is, lock servers enter a silent state. In this case, when receiving a lock reclaim request, a protocol server in the node device sends, according to information carried in the lock reclaim request or stored lock permission, the lock reclaim request to a corresponding lock server for processing. When receiving a locking request, the lock server directly returns a rejection response message to the requester. The locking request is used by an application in an application host to apply to the lock server for new lock permission of a file. That is, when the lock server in the node device is in the silent state, the distributed system can process only a lock reclaim request and cannot process a locking request. In this case, although only one lock server in the distributed system is faulty, a local problem becomes a global problem because all the lock servers are silent. In addition, a normal locking request cannot be processed, which may cause service interruption and reduce reliability of the distributed system.