1. Field of the Invention
This invention is related to the field of computer networks and, more particularly, to node recovery.
2. Description of the Related Art
While individual computers enable users to accomplish computational tasks which would otherwise be impossible by the user alone, the capabilities of an individual computer can be multiplied by using it in conjunction with one or more other computers. Individual computers are therefore commonly coupled together to form a computer network.
Computer networks may be interconnected according to various topologies. For example, several computers may each be connected to a single bus, they may be connected to adjacent computers to form a ring, or they may be connected to a central hub to form a star configuration. These networks may themselves serve as nodes in a larger network. While the individual computers in the network are no more powerful than they were when they stood alone, they can share the capabilities of the computers with which they are connected. The individual computers therefore have access to more information and more resources than standalone systems. Computer networks can therefore be a very powerful tool for business, research or other applications.
In recent years, computer applications have become increasingly data intensive. Consequently, the demand placed on networks due to the increasing amounts of data being transferred has increased dramatically. In order to better manage the needs of these data-centric networks, a variety of forms of computer networks have been developed. One form of computer network is a xe2x80x9cstorage Area Networkxe2x80x9d. Storage Area Networks (SAN) connect more than one storage device to one or more servers, using a high speed interconnect, such as Fibre Channel. Unlike a Local Area Network (LAN), the bulk of storage is moved off of the server and onto independent storage devices which are connected to the high speed network. Servers access these storage devices through this high speed network.
One of the advantages of a SAN is the elimination of the bottleneck that may occur at a server which manages storage access for a number of clients. By allowing shared access to storage, a SAN may provide for lower data access latencies and improved performance. However, because there exists a variety of file formats and no universal standard, the most common SAN configuration involves a homogeneous collection of hosts all utilizing the same file format. While homogeneous configurations may take advantage of some of the benefits of SANs, many organizations include nonhomogeneous systems consisting of a variety of computing platforms which they would like to use together.
When building a SAN for a heterogeneous environment, the problems of dealing with incompatible file formats can be a significant barrier to data sharing. One possible solution is to restrict access for particular type of host to a storage device of the same type. However, such a restriction results in the loss of many of the benefits of shared access to storage devices on the SAN. Another possible solution is to utilize a complicated scheme of importing, exporting and translating data. However, such mechanisms typically involve undue overhead and frequently result in the loss of information in the process.
Another feature of file systems which may impact performance involves how recovery from system interruptions are handled. Typically, when a file system crashes or is otherwise interrupted, the host node must go through a lengthy process upon restarting which may cause the node and file system to be unavailable for a significant period of time.
The problems outlined above are in large part solved by a network file system and method as described herein. When a node in a computer network becomes unavailable, file systems which require verification and are locked are logged in a recovery log and checking continues. Upon completing available file system verifications, those file systems which were logged are checked for availability in the background. When a logged file system becomes available, it is verified. During the time spent waiting for a logged file system to become available, the affected node is available for other processing. Advantageously, downtime of an affected node may be reduced.
Broadly speaking, a method of file system recovery logging by a node is contemplated. Upon rebooting, or restarting, an affected node first identifies those file systems which may require verification. If an identified file system requires verification and is locked, an indication of this fact is logged and checking continues with other file systems. Otherwise, if the file system is not locked, it is verified. Upon completing an initial check of each file system, those file systems which were logged are checked for availability in the background. When a logged file system becomes available, it is verified. Time during which the node is waiting for a logged file system to become available may be spent processing other tasks.