Society is increasingly relying on computers and networks to interact and conduct business. To achieve a high level of availability demanded in critical systems, unplanned downtime caused by software and hardware defects should be minimized.
The financial services industry is but one example of an industry that demands highly available systems. Indeed, a large number of data processing activities in today's financial industry are supported by computer systems. Particularly interesting are the so-called “real-time” and “near real-time” On-Line Transaction Processing (OLTP) applications, which typically process large numbers of business transactions over a prolonged period, with high speed and low latency. These applications generally exhibit the following characteristics: (1) complex and high speed data processing, (2) reliable non-volatile data storage, and (3) high level of availability, i.e. the ability to support the services on a substantially uninterrupted basis. When implemented, however, existing applications tend to tradeoff between these performance requirements, since, due to their contradictory effects on the system behavior, no design can completely satisfy all of three characteristics simultaneously, as outlined in greater detail below.
First, complex data processing refers to the ability to perform, in a timely fashion, a large number of computations, database retrievals/updates, etc. This can be implemented through parallel processing, where multiple units of work are executed simultaneously on the same physical machine or on a distributed network. In some systems, the outcome of each transaction depends on the outcomes of previously completed transactions. The parallel aspects of such systems are, inherently, non-deterministic: due to race conditions, operating system scheduling tasks, or variable network delays, the sequence of message and thread execution can not be predicted, nor can they be processed in parallel simply by passing copies of input message to a duplicate system. Non-deterministic systems have non-identical output, so they are not run in parallel on two different computing machines, with the intention of having one substitute for the other in case of failure.
Second, reliable non-volatile data storage refers to the ability to persistently store the processed data, even if a number of the system's software or hardware components experience unexpected failure. This can usually be implemented by using Atomic, Consistent, Isolated, and Durable (“ACID”) transactions when accessing or modifying the shared data. ACID transactions can ensure the data integrity and persistence as soon as a unit of work is completed. Every committed ACID transaction is written into the non-volatile computer memory (hard-disk), which helps ensure the data durability, but it is very costly in terms of performance and typically slows down the whole system.
Third, highly available systems attempt to ensure that percentage of availability of a given computer system is as close as possible to 100% of the time. Such availability can be implemented through redundant software and/or hardware, which takes over the functionality in case a component failure is detected. In order to succeed, the failover replicates not only the data, but also the process state. As will be appreciated by those of skill in the art, state replication can be particularly challenging in non-deterministic systems (i.e. systems where computational processing of the same set of events can have more than one result depending on the order in which those events are processed).
Highly available software applications are usually deployed on redundant environments, to reduce and/or eliminate the single point of failure that is commonly associated with the underlying hardware. Two common approaches are known as hot failover and warm failover. Hot failover refers to simultaneously processing the same input in multiple systems, essentially providing complete redundancy in the event of a failure in one of those systems. Warm failover refers to replicating the state of the application (i.e. the data) in backup systems, without processing that data in the backup systems, but having applications capable of processing that data loaded and standing by in the event of failure of a primary system. Cold failover, which is not considered by many to be a form of high availability, refers to simply powering-up a backup system and preparing that backup system to assume processing responsibilities from the primary system.
In hot failover configurations, two instances of the application are simultaneously running on two different hardware facilities, processing copies of the same input. If one of them experiences a critical failure, a supplemental synchronization system can ensure that the other one will continue to support the workload. In the warm failover configurations, one of the systems, designated primary, is running the application; in case of failure, the second system, designated backup, which is waiting in a standby state, will “wake up”, take over, and resume the functionality.
Prior art hot failover approaches have at least two disadvantages. First, supplemental software has to run in order to keep the two systems synchronized. In the case of non-deterministic systems, this synchronization effort can lead to an unacceptable (or otherwise undesirable) decrease in performance and complexity where the order of arrival of events must be guaranteed to be identical. Also, prior art concurrent systems used in such applications typically allow multiple threads to execute simultaneously, so they are inherently non-deterministic. Also non-deterministic are the systems with servers and geographically distributed clients, where the variable network delay delivers the messages to the server in an unpredictable sequence.
Warm failover can be used to overcome certain problems with hot failover. Warm failover can be another way to implement failover of non-deterministic systems, by replicating the system data to a redundant, backup system, and then restoring the application functionality to the secondary system. This approach has its drawbacks in the time required to recover the data to a consistent state, then to bring the application to a functional state, and lastly, to return the application to the point in processing where it left off. This process normally takes hours, requires manual intervention, and cannot generally recover the in-flight transactions.
A number of patents attempt to address at least some of the foregoing problems. U.S. Pat. No. 5,305,200 proposes what is essentially a non-repudiation mechanism for communications in a negotiated trading scenario between a buyer/seller and a dealer (market maker). Redundancy is provided to ensure the non-repudiation mechanism works in the event of a failure. It does not address the fail-over of an on-line transactional application in a non-deterministic environment. In simple terms, U.S. Pat. No. 5,305,200 is directed to providing an unequivocal answer to the question: “Was the order sent, or not?” after experiencing a network failure.
U.S. Pat. No. 5,381,545 proposes a technique for backing up stored data (in a database) while updates are still being made to the data. U.S. Pat. No. 5,987,432 addresses a fault-tolerant market data ticker plant system for assembling world-wide financial market data for regional distribution. This is a deterministic environment, and the solution focuses on providing an uninterrupted one-way flow of data to the consumers. U.S. Pat. No. 6,154,847 provides an improved method of rolling back transactions by combining a transaction log on traditional non-volatile storage with a transaction list in volatile storage. U.S. Pat. No. 6,199,055 proposes a method of conducting distributed transactions between a system and a portable processor across an unsecured communications link. U.S. Pat. No. 6,199,055 deals with authentication, ensuring complete transactions with remote devices, and with resetting the remote devices in the event of a failure. In general, the foregoing do not address the fail-over of an on-line transactional application in a non-deterministic environment.
U.S. Pat. No. 6,202,149 proposes a method and apparatus for automatically redistributing tasks to reduce the effect of a computer outage. The apparatus includes at least one redundancy group comprised of one or more computing systems, which in turn are themselves comprised of one or more computing partitions. The partition includes copies of a database schema that are replicated at each computing system partition. The redundancy group monitors the status of the computing systems and the computing system partitions, and assigns a task to the computing systems based on the monitored status of the computing systems. One problem with U.S. Pat. No. 6,202,149 is that it does not teach how to recover workflow when a backup system assumes responsibility for processing transactions, but instead directs itself to the replication of an entire database which can be inefficient and/or slow. Further, such replication can cause important transactional information to be lost in flight, particularly during a failure of the primary system or the network interconnecting the primary and backup system, thereby leading to an inconsistent state between the primary and backup. In general, U.S. Pat. No. 6,202,149 lacks certain features that are desired in the processing of on-line transactions and the like, and in particular lacks features needed to failover non-deterministic systems.
U.S. Pat. No. 6,308,287 proposes a method of detecting a failure of a component transaction, backing it out, storing a failure indicator reliably so that it is recoverable after a system failure, and then making this failure indicator available to a further transaction. It does not address the fail-over of a transactional application in a non-deterministic environment. U.S. Pat. No. 6,574,750 proposes a system of distributed, replicated objects, where the objects are non-deterministic. It proposes a method of guaranteeing consistency and limiting roll-back in the event of the failure of a replicated object. A method is described where an object receives an incoming client request and compares the request ID to a log of all requests previously processed by replicas of the object. If a match is found, then the associated response is returned to the client. However, this method in isolation is not sufficient to solve the various problems in the prior art.
Another problem is that the method of U.S. Pat. No. 6,575,750 assumes a synchronous invocation chain, which is inappropriate for high-performance On-Line Transaction Processing (“OLTP”) applications. With a synchronous invocation the client waits for either a reply or a time-out before continuing. The invoked object in turn may become a client of another object, propagating the synchronous call chain. The result can be an extensive synchronous operation, blocking the client processing and requiring long time-outs to be configured in the originating client.