Highly available transaction processing systems have been the subject of much research and many computer system and database management systems “DBMS”) design efforts during the past 30 years. Highly available transaction processing systems are generally designed to include redundant components, so that when one of a set of redundant components fail, another of the set of redundant components can quickly replace the failed component to allow for continued operation of the system.
Failover from a failed hardware device to a redundant replacement component involves detecting the failure, mapping the failure onto an internal map of the transaction processing system, restoring operation with an operational redundant component in place of the failed component, and restarting and resynchronizing software processes interrupted by the component failure. In many cases, transaction processing is failed over to a complete, redundant system remotely located from a failed primary system until operation of the failed primary system is restored.
Currently available highly available transaction processing systems suffer from a number of deficiencies. In many such systems, for example, the step of restarting and resynchronizing software processes interrupted by the component failure is imperfect. In many cases, a number of different software processes are currently operating within a transaction processing system, and must be started in a particular order with respect to one another. However, such ordered launching of software processes is not configured into the failover process, so that an attempted failover that incorrectly launches a number of software processes in a an order different than the required order may fail, or may produce incorrect system operation. As another example, software processes often precede to process subsequent transactions before waiting for data related to an already processed transaction to be redundantly stored in multiple, non-volatile data storage devices so that the data related to the processed transaction is fully recoverable despite any of a wide variety of failure conditions. Moreover, currently available lack convenient interfaces to allow an application program or database management system to determine the status of a replication operation launched in order to replicate data from a primary system to a redundant, remote system. Thus, a hard failure of a primary system that results in termination of software processes may result in unrecoverable data loss that, in turn, results in unrecoverable loss of completed transactions. In certain cases, inconsistencies between data stored on a primary and on a remote transaction processing system may lead to halting of software processes. Many currently-available high availability transaction processing systems to not provide automated system monitoring, and do not provide useful and complete management tools for system managers who need to manage the systems.