1. The Field of the Invention
This invention relates to network server computer systems, and in particular an improvement to the methods used to recover from a computer failure in a system that provides a virtual storage area network, in which multiple server computers access the same network data.
2. Background and Related Art
In a network server computer system, there are a plurality of personal computers or user workstations that are usually supported by two or more servers. In order to provide continuous operation of these computer systems, it is necessary for the computer system to provide a method for overcoming faults and failures that often occur within the network server computer system. This is generally done by having redundant computers and mass storage devices, such that a backup server computer or disk drive is immediately available to take over in the event of a fault or failure of a primary server computer or disk drive. A technique for implementing a fault-tolerant computer system is described in Major et al., U.S. Pat. No. 5,157,663. In particular, Major provides a redundant network file server system capable of recovering from the failure of either the computer or the mass storage device of one of the file servers. The file server operating system is run on each computer system in the network file server, with each computer system cooperating to produce the redundant network file server. This technique has been used by Novell, of Provo, Utah, to implement its SFT-III fault-tolerant file server product.
More recently, fault-tolerant networks known as “storage area networks” have been developed. A storage area network (“SAN”) connects multiple servers of an enterprise network with a common or shared storage node to store and access network data. In the case of a failure of one of the servers, the other servers can perform network services that would otherwise have been provided by the failed server.
FIG. 1 illustrates a typical architecture of a network system that includes a conventional storage area network. FIG. 1 illustrates three server computers 110, 120, and 130 that provide network services for network 101. Although three servers are illustrated in FIG. 1, network 101 may include as few as two servers or more servers than are shown in FIG. 1. This variable number of server computers depends upon the individual needs of the network being served. For example, a large organization may require the use of several server computers, likewise a smaller organization might simply require two server computers.
In this configuration, user workstations (or personal computers) 102a, 102b, 102c, and 102n are connected to network 101 and have access to server computers 110, 120, and 130. Each user workstation is generally associated with a particular sever computer, although, in a network system that includes a storage area network, any server can provide substantially any network services for any workstation, as needed. A user, at a user workstation 102a, 102b, 102c, and 102n, issues requests for operations, such as read, write, etc., which are transmitted to the associated server computer, 110, 120, or 130, which then performs the requested operation using I/O drivers 113, 123, and 133. Servers 110, 120, and 130 perform data operations on network data that is stored in disks 142 of shared storage node 140 usmnn connections 115, 125, and 135. Each server 110, 120, and 130 has access to any network data stored at shared storage node 140, subject to policing protocol described below. The storage area network of FIG. 1 includes the physical communication infrastructure and the protocols that enable server computers 110, 120, and 130 to operate with shared storage node 140.
Each server computer includes software representing a policing protocol module 111, 121, 131, that cooperates with the policing protocol modules of the other server computers to implement a policing protocol. The policing protocol prevents data corruption by controlling the performance of requested operations. For example, the policing protocol implemented by modules 111, 121, and 131 may allow a server to respond to read operation requests at any time, but may permit only one server computer at a time to perform a write operation request.
One advantage of SANs is that all server computers have access to all network data through the shared storage node. If one server experiences a failure, workstations can bypass the failed server and issue operation requests to other servers. The shared storage node prevents the need for mirroring data between multiple storage nodes associated with different servers. However, storage area networks have at least two significant liabilities that have prevented them from becoming fully accepted in the marketplace and make them unsuitable for many customers.
First, SANs require specialized hardware, namely, the shared storage node. Many potential users of storage area networks find the cost of purchasing and maintaining a shared storage node prohibitive. In practice, many users of SANs are large corporations or other enterprises that have relatively large networks with large numbers of servers. Enterprises that have the need for only two or three servers may not find it cost-effective to implement a storage area network.
Second, although SANs are tolerant of failures of network servers, they are not well suited for responding or protecting against other hardware failures. For example, because a storage area network uses a single shared storage node, any failure or problem associated with the shared storage node can cause the SAN to go off-line and also to potentially lose data that has been stored in the shared storage node. Accordingly, the basic SAN configuration does not provide a high degree of data integrity and may not be acceptable for use in organizations in which the risk of data loss is not acceptable.