1. Field of the Invention
The invention pertains to computer network servers using backup servers for fault tolerance, and more specifically to a network server with a posted write disk cache and also with a backup server.
2. Description of the Related Art
Networked microcomputer systems are the future of the world of computing. Where once the mainframe dominated, supplying the computing needs for users throughout a large corporation, now groups of ever more powerful networked microcomputers satisfy those same computing needs. The networked microcomputer has invaded the realm of the mainframe, and now provides computing and communications between users for both large and small companies alike.
After microcomputer networks were developed, fault tolerant systems for such networks soon followed. Mainframe computers were renowned for their fault tolerance, being able to recover from a variety of system and power failures that could otherwise cripple a corporation's data processing. Those systems failed "gracefully" and then gracefully recovered from such failures.
As users have moved to microcomputer networks rather than mainframes, they have demanded similar fault tolerance. To this end, a number of systems now provide such fault tolerance. One example is the "backup" server. Computer networks typically centralize their storage in one or more network servers, which are generally powerful microcomputers running network operating systems (NOSs). These servers also typically have very large mass storage subsystems to store network data. A server failure could prevent the network users from being able to access that centralized data. To forestall this event, a number of fault tolerant network server systems have been developed, such as SFT III, or server fault tolerance level three, by Novell Corporation. Under SFT III, two identical servers run in tandem, each including an identical mass storage subsystem. The servers communicate through a fiber optic link to maintain concurrency, so if one server fails, the other continues providing the stored data to the network. While this technique is quite robust, it is also expensive. It requires at least twice the investment in hardware and software a single network server would otherwise require.
A number of other systems reduce such high costs. One example is a backup server that seizes control of a failed primary server's mass storage subsystem. Such a system is described in assignee's U.S. patent application Ser. No. 08/445,283, entitled MULTI-SERVER FAULT TOLERANCE USING IN-BAND SIGNALLING and filed May 19, 1995, which is hereby incorporated by reference. In that system, heartbeat messages are sent between a primary and backup server. If the primary server fails, the backup server then causes the primary server's mass storage subsystem to switch ports to the backup server. The backup server then provides the data on that mass storage subsystem to the network.
This system reduces costs because it only requires a single mass storage subsystem and because the backup server can be a less powerful--and less expensive--computer. Advanced mass storage subsystems typically have built-in redundancy, such as by using a RAID (redundant array of inexpensive disks) disk array system. So a mass storage subsystem failure is less likely than a server failure.
Another alternative system includes multiple active servers that can each seize control of the other server's storage subsystem should that other server fail. For example, two servers each provide network access to their own separate mass storage subsystems. Should one of the servers fail, the remaining running server seizes control of the other server's mass storage subsystem, then providing the data on both its own and the failed server's storage subsystem to the network users, albeit with reduced performance. Such a system is described in assignee's patent application Ser. No. 08/491,738, entitled FAULT TOLERANT MULTIPLE NETWORK SERVERS and filed Jun. 19, 1995, which is hereby incorporated by reference.
All of these systems, and a wide variety of other systems, provide fault tolerance while reducing costs and maximizing network computing capacity.
Such systems, however, typically use disk controllers to access data in their mass storage subsystems. Like networks, disk controllers have also evolved. One significant improvement in such controllers has been the addition of a "posted write" cache. When data is written to a disk controller, instead of that data being immediately written to the mass storage subsystem, it will sometimes be stored in the posted write cache. That data is later written to the mass storage subsystem during a period of inactivity, for example. Such systems reduce the demand on the bus used to communicate with the mass storage subsystem, and typically improve system performance. Such a system is described in assignee's patent applications Ser. No. 07/894,111, entitled METHOD AND APPARATUS FOR MAINTAINING AND RETRIEVING LIVE DATA IN A POSTED WRITE CACHE IN CASE OF POWER FAILURE, filed Jun. 5, 1992; and Ser. No. 08/402,731, entitled DISK DRIVE CONTROLLER WITH A POSTED WRITE CACHE MEMORY, filed Mar. 13, 1995; and in assignee's U.S. Pat. No. 5,408,644, entitled A METHOD AND APPARATUS FOR IMPROVING THE PERFORMANCE OF PARTIAL STRIPE OPERATIONS IN A DISK ARRAY SUBSYSTEM, which are hereby incorporated by reference.
Both of these developments reduce costs, improve resource utilization, increase performance, and otherwise improve the state of network computer art. It would be highly desirable to use these techniques together.