1. Field of the Invention
The invention relates generally to computer systems. More specifically, the invention relates to dual outboard cache devices for computer system disaster recovery.
2. Description of the Prior Art
High reliability computers and computer systems are used in applications including on-line airline reservation and ticketing systems, credit card transaction systems, and banking and automatic teller machine systems. Such applications require high throughput transaction processing with short response times. The applications require high reliability and availability as users are unable to transact business while the computer system is down.
High throughput has been achieved using an outboard file cache to speed access to files. The term "outboard" refers to the device being located outside of the host input-output boundary. The design and use of an outboard file cache system is described in commonly assigned U.S. patent application Ser. No. 08/174,750, filed Dec. 23, 1993, entitled "Outboard File Cache System", which is herein incorporated by reference. The outboard device is a fully functional computing device. The outboard file cache is also referred to as an "Extended Processing Complex" or "XPC", as the XPC is a preferred embodiment of the present invention. The term XPC is used interchangeably with "outboard file cache" throughout the present application.
The outboard file cache provides fast reading and writing of disk files, but also presents a potential point of failure in a system. In systems having multiple hosts and redundant disk arrays, a single, shared XPC may present a not insignificant portion of total system risk. Even though the XPC is highly reliable, it cannot guarantee reliability in the face of errant forklifts, fire, flooding, and natural disasters.
Adding a second XPC by itself is not sufficient to provide a solution acceptable from both reliability and performance standpoints. Attempting to maintain a second XPC containing exactly the same data as a first XPC through lock-step synchronization, where the first XPC is used as a cache presents significant performance problems to be resolved. The XPC is an intelligent device, handling memory management tasks, deciding when to request that cached files be destaged to disk and what files to request destaging for. Synchronizing two XPCs, especially XPCs at geographically remote sites, would prove difficult. Providing a second site to take over after a failure of the first XPC requires bringing the second site up to date. This could be done using conventional audit trail techniques, but bringing a large database up to date could take hours or days. Writing everything to disk is a solution, but undoes many of the advantages of using cache.
In the larger picture, a single site is also a single point of failure, and rapid recovery from total site failure is also a desirable system feature. Having data in a fast and volatile cache, such as the outboard file cache, is desirable from a performance standpoint, but undesirable when the total site is lost. Continual reads and writes of disk at two sites solve the volatility problem, but undo many advantages of using cache.
What remains to be provided is a system allowing a simple to implement method for rapid recovery from an XPC failure, without requiring audit trails or database management systems, and without giving up the speed advantages of cached file access. What remains to be provided is a system allowing for rapid recovery when one of two computer system sites is lost, while preserving advantages of using cached file access.