Nowadays, as more and more enterprises' businesses are supported and automated by IT, a highly available enterprise IT infrastructure becomes critically important, because companies cannot afford to have unexpected downtime, especially for those critical businesses. Downtime will not only cause unbearable revenue loss and customer dissatisfaction, but also may lead to a seriously damaged reputation and even business closure. For example, taking an online B2C (Business-to-Consumer e-commerce model) book trade system, such as Amazon as an example, when the system is crashed, the company will lose potential revenue during the downtime, and more seriously, the customers' frustration experience of unavailable business services may lead to customer satisfaction issues, customer loss, and even brand reputation damages. Therefore, to ensure business service availability, high availability of IT infrastructure is the guarantee. Usually, highly available IT infrastructure is achieved by redundancy-based HA solutions, which, from an IT management perspective, are the primary availability measures. A redundancy-based HA solution provides customers with continual services by failover of critical data and applications from a crashed IT system to another peer system, thus reducing the service downtime and the corresponding loss.
Redundancy-based HA solutions are essential building blocks of a highly available IT infrastructure. However, they are expensive due to the redundant IT resources required and complex, since multiple IT systems should be cross configured. Thus, how these solutions can be effectively planned and implemented in the IT infrastructure becomes critically important. Traditionally, business people firstly specify the availability requirement for each business service, and then an IT architect or even an HA expert figures out where and how HA solutions should be applied; after the HA architecture is set down, IT operators will configure and provision the specific HA solutions following the architecture decisions. However, the traditional plan and implementation scenario of such HA solutions are experience-based and highly expertise intensive, which further results in high IT resource and operational costs.
A challenge now faced is how to automate the design, plan, deployment and configuration of specific HA solutions according to the architecture decisions. Traditionally, these activities are manually implemented. For example, if the HA architecture tries to apply an HA solution to a DB2 database, usually, there are several HA solutions that can be applied, such as HADR (High Availability Disaster Recovery) and hot standby, where the DB2 Database file is located at a shared storage and is accessible by two DB2 instances. IT operators must first select a specific solution, which requires the understanding and trade-off among these HA solutions, and is usually difficult for those IT operators with little HA knowledge. Moreover, even if a specific HA solution is selected, it is still a complicated work to configure and provision such a specific HA solution, because a specific HA solution usually involves complex configurations across multiple IT resources and many interdependencies and constraints, to make sure the solution can work correctly and in an optimized way. For example, configuring a DB2 HADR solution involves a primary OS (Operating System), a primary DB2 instance, a primary database, a standby OS, a standby DB2 instance and a standby database, and these cross-component configurations should satisfy more than 50 constraints. Thus, if these activities are performed manually, it would be quite labor intensive, error-prone and highly expertise dependent, which may further result in high IT operational cost.