Advances in software programming and computing technology have made increasingly sophisticated and feature-rich software applications available to consumers and businesses alike. For businesses in particular, these powerful software applications provide an ever expanding array of benefits in terms of improved accuracy, efficiency, and convenience for numerous tasks performed on a regular basis. As a result, companies both large and small have come to depend increasingly on these software applications for most aspects of their businesses. Industry indicators predict this trend will likely continue and may even accelerate in the years ahead.
Because of the dependency, companies must keep these software applications continuously available, particularly those applications that are considered to be business-critical software applications. An unplanned outage, even briefly, may have a significant adverse impact on sales, revenue, productivity, and the like. Longer outages may cost the companies in terms of customer loyalty, public perception (i.e., stock price), and may compromise regulatory compliance in highly regulated industries. Consequently, most companies retain a trained technical support group or staff dedicated to setting up and supporting the various software applications used by their organizations.
The technical support staff is also responsible for recovering or bringing the software applications back online in case of a system-wide failure. Such failures are commonly called “catastrophic failures” and refer to situations where most or all of a company's computing capacity is temporarily or permanently wiped out. A catastrophic failure may occur, for example, as a result of a natural disaster (e.g., earthquake, tornado, flood, etc.), but may also be due to human error (e.g., chemical spill, building fare, gas explosion, etc.). More recently, acts of terrorism or sabotage may also cause a catastrophic failure.
To mitigate the impact of catastrophic failures, most companies have an emergency recovery procedure designed to restore at least some computing capacity and, ideally, all critical software applications. In one such recovery procedure, excess computing capacity is reserved offsite at a remote location (e.g., another city, state, region, etc.) and critical applications are brought up at the remote location upon detection of a catastrophic failure. This recovery can take a long time to complete, however, potentially resulting in hours or even days of downtime for some critical applications. At least some of this delay is due to few or none of the critical applications being already up and running at the remote recovery location. In addition, the recoveries have heretofore been largely manual processes, requiring entry of numerous commands on the computing systems at the remote recovery location in order to properly bring the applications back online. More recently, script files have been used to automatically enter the commands required for recovering various software applications. However, these script files must be frequently maintained and updated by the technical support personnel in order to capture the most recent changes in the policies and parameters of the software applications. This frequent maintaining and updating presents a tedious and time-consuming burden on the technical support personnel.
Accordingly, what is needed is a more efficient way to recover computing capacity and critical software applications in the event of a catastrophic failure. In particular, what is needed is a way to automatically detect such a catastrophic failure and quickly initiate procedures for recovering lost computing capacity and critical software applications.