1. Field of the Invention
This invention relates to network file server computer systems, and in particular to the methods used to recover from a computer failure in a system with a plurality of computer systems, each with its own mass storage devices.
2. Prior State of the Art
It is often desirable to provide continuous operation of computer systems, particularly file servers which support a number of user workstations or personal computers. To achieve this continuous operation, it is necessary for the computer system to be tolerant of software and hardware problems or faults. This is generally done by having redundant computers and mass storage devices, such that a backup computer or disk drive is immediately available to take over in the event of a fault.
A number of techniques for implementing a fault-tolerant computer system are described in Major et al., U.S. Pat. No. 5,157,663, and its cited references. In particular, the invention of Major provides a redundant network file server capable of recovering from the failure of either the computer or the mass storage device of one of the file servers. The file server operating system is run on each computer system in the network file server, with each computer system cooperating to produce the redundant network file server. This technique has been used by Novell to implement its SFT-III fault-tolerant file server product.
There are a number of reasons why the use of a redundant network file server such as described in Major may be undesirable. As can be seen from the description in Major, the software needed to provide such a redundant network file server is considerably more complex than the software of the present invention. This can result in a lower reliability due the increased presence of programming errors (xe2x80x9cbugsxe2x80x9d) in the complex software. Also, the processing time required to handle a client request may be increased by the complexity of the redundant network file server software, when compared to a single-processor network file server. Finally, license restrictions or other limitations may make it infeasible or uneconomical to run a redundant network file server instead of a normal network file server.
It is an object of this invention to provide the rapid recovery from a network file server failure without the complex software of a redundant network file server. This is achieved by having a second, backup computer system with its own mass storage device (generally a magnetic disk). This backup computer is connected by an appropriate means for communications to the file server computer, allowing the transmission of information (such as commands and data) between the two computers. A mass storage emulator, running like a device driver on the file server computer, sends information to a mass storage access program on the backup computer. The mass storage access program performs the requested operation (read, write, etc.) on the mass storage system connected to the backup computer, and returns the result to the mass storage emulator on the file server computer.
This makes the mass storage device on the backup computer look like another mass storage device on the file server computer. The data mirroring option of the file server operating system can be activated (or, if the operating system does not support data mirroring, a special device driver that provides data mirroring can be used), so that a copy of all data written to the mass storage device directly connected to the file server will also be written to the mass storage device on the backup computer, through the mass storage emulator and mass storage access programs.
When a failure is detected in the file server computer system, the backup computer become the file server. The mass storage device of the backup computer will contain a copy of the information on the mass storage device of the failed file server, so the new file server can start with approximately the same data as when the previous file server failed.
It is a further object of this invention to allow a single backup computer to support a plurality of file server computers. This is achieved by having each file server computer run a mass storage emulator. The backup computer can run either a single mass storage access program capable of communicating with a plurality of mass storage emulators. Alternatively, if the operating system on the backup computer permits the running of multiple processes, the backup computer can run a separate mass storage access program for each mass storage emulator.
It is a further object of this invention to improve the reliability of a redundant network file server computer system by reducing the complexity of the software when compared to the software of a redundant network file server. The programs for the mass storage emulator on the file server computer and the mass storage access on the backup computer can be considerably less complex than a full redundant file server operating system.
Furthermore, while it is possible for the backup computer to be running the file server operating system (and acting as another file server), it is also possible to run the mass storage access program under a simple operating system or as a stand-alone program, reducing the complexity and increasing the performance of the backup computer system.
These and other features of the invention will be more readily understood upon consideration of the attached drawings and of the following detailed description of those drawings and the presently preferred embodiments of the invention.