1. Field of the Invention
The present invention generally relates to computer system environments, and more particularly to systems, methods and computer program products that provide disaster recovery and fault tolerance for such environments.
2. Related Art
In today's technological climate it is typical for an enterprise (i.e., a business concern, corporation, institution, organization, government agency or the like) to own and operate one or more computer systems (e.g., a collection of servers, desktops, laptops and the like all connected via local area networks (LANs), wide area networks (WANs) and the like). Such computer systems are used by enterprises so that their personnel (i.e., end users) can access not only software applications (e.g., spreadsheet, word processing, accounting and like applications), but also electronic mail (“e-mail”). There can be no doubt that continuous availability of these computer systems is vital to an enterprise's operations.
Oftentimes, one or more of an enterprise's computer systems are not available. These “down-times” can be caused by facility disaster, hardware failures, software application failures, purposeful attacks from virus or simply scheduled (i.e., periodic) maintenance of one or more of the computer system's infrastructure components. From the end users' perspective, however, it doesn't matter what causes down-time. The end users just know that they cannot access their software applications and/or e-mail to conduct business. Therefore, any down-time of an enterprise's computer systems cuts into their personnel's productivity and thus the enterprise's overall productivity (and oftentimes, profitability).
Information Technology (IT) managers or network administrators charged with the responsibility to minimize down-time and maximize up-time of an enterprise's computer systems are thus faced with a challenge to “bomb-proof” such systems. To meet that challenge, today's IT manager or network administrator is faced with a bewildering array of software and hardware piecemeal components that must be stitched together in order to possibly deliver some level of uptime assurance. These resulting solutions are complex, difficult to maintain, and require significant investment.
For example, several software vendors offer remote data replication products for operating systems such as the Microsoft® Windows 2000™ operating system, but these software products do not help an enterprise's system environments stay healthy, and do not necessarily provide for application failure switch-over and switch-back procedures.
Given the above-described problem, what is needed is a system, method and computer program product for distributed application monitoring, and application and end user switch-over control, in order to provide disaster recovery and fault tolerance and to generally limit an enterprise's computer system down-time.