The increased reliance by business on On-Line Transaction Processing and Decision Support Systems has increased the demand for high availability systems since these systems are critical to the functioning of day-to-day activities in many businesses. These systems are not only essential for the support of normal daily operations, but they also store critically important customer and corporate data. Continuous availability is no longer an ideal; it is a necessity for many companies. Longer work days, expansion into new markets and customer demand for more efficient service create an expanded requirement for increased system availability. Users are demanding a means of ensuring very high availability of their applications and the access to data that permits them to accomplish their tasks and provide the highest levels of customer service. Interruption of workflow due to system failure is expensive and it can cause the loss of business. The need to increase computer system availability is becoming one of businesses key concerns.
Implementation of client/server computing is growing throughout today's businesses--for key business applications as well as electronic mail, distributed databases, file transfer, retail point-of-sale, inter-networking, and other applications. It is possible for companies to gain competitive advantages from client/server environments by controlling the cost of the technology components through economies of scale and the use of clustered computing resources. There is a boost in productivity when businesses have high availability and easy access to information throughout the corporate enterprise.
Computer system availability and reliability are improved when multiple servers are utilized together with a "fail-over" scheme such as provided by NCR Corporation's LifeKeeper product. In such a system, should one server fail the functions and applications associated with the failed server are transferred to one or more of the remaining operational or standby servers.
An important component of a high-availability client/server system is a reliable, fault-tolerant data storage system. In some networked or "clustered" multiple server arrangements, the physical data storage system may be a shared RAID (Redundant Array of Inexpensive Disks) disk array system, or a shared pair of disk drives or disk arrays operating in a mirrored arrangement.
A computer system including multiple servers and a pair of shared disk drives is shown in FIG. 1. FIG. 1 provides a diagram of clustered or networked computers having a primary server 101 and a secondary server 103 in a fail-over pair arrangement. Primary server 101 is the preferred application server of the pair, and secondary server 103 preferably provides fail-over protection for the primary server. The primary and secondary servers are coupled through a network bus system 105 to a plurality of client computers 107 though 109. The primary and secondary servers 101 and 103 each shares access to a pair of disk storage devices 111 and 113. Disk storage devices 111 and 113 are SCSI (Small Computer Systems Interface) disk drives or disk arrays connected to servers 101 and 103 through a pair of SCSI busses 115 and 117.
Disk storage devices 111 and 113 are two equal-capacity storage devices that mirror each other. Each storage device contains a duplicate of all files contained on the other storage device, and a write or update to one storage device updates both devices in the same manner. In the event that either storage device fails, the data contained therein remains available to the system from the operational mirror storage device.
In other client/server arrangements, the physical storage devices for the primary and secondary servers may be separate, non-shared, physical storage devices. A network based file system volume replication scheme where the contents of the file system stored on a primary physical storage media are also copied to a secondary physical storage media is commercially available from NCR Corporation, assignee of the present application, under the product name "Extended Mirroring".
Most disk mirroring procedures utilizing shared drives will write or update both mirror drives synchronously. In systems performing disk mirroring with non-shared drives over a network, writes directed to a primary drive are received and forwarded to the secondary mirror drive. Upon receipt from the secondary drive of an acknowledgement signal indicating a successful update of the secondary drive, the write to the primary is completed. Although the updates to the primary and secondary drives do not occur simultaneously, this process will also be referred to as a synchronous write in the following discussion.
Disk mirroring applications that provide synchronous writes are inherently limited in what they can do. Being synchronous, each writer blocks until the write is complete. This limits performance and flexibility.