The Service Availability Forum (SA Forum) is a consortium of industry-leading companies promoting a set of open specifications that enables the creation and deployment of highly available, mission critical services. As a standardization body, the SA Forum has defined a set of open specifications for middleware services including the Application Interface Specification (AIS) (SA Forum, Service Availability Interface, Overview, SAI-Overview-B.05.03) which consists of different services to enable and manage high availability services. Service availability in the AIS architecture is provided by using software and hardware redundancy techniques.
The Availability Management Framework (AMF) is one of the AIS services that supports and manages service availability by coordinating and managing redundant software entities within a cluster. A cluster is a logical cluster that includes a number of cluster nodes (also referred to as “nodes”). These nodes host various resources in a distributed computing environment. An application that is managed by the AMF to provide service availability is structured into logical entities according to the model expected by the AMF.
The AMF manages redundant service units to ensure service availability in case of failures. These redundant service units are grouped into a service group to guarantee service availability for a particular set of service instances. Each service instance represents workload incurred by the provision of services. At runtime the AMF assigns each service instance to a set of service units; some of the service units actively provide the associated service, and the other service units may standby to protect the service in case of a failure of the active service units.
Accordingly, if the service units of a service group that participate in the provisioning and protecting of a service instance are placed on the same hardware, the failure of this hardware causes all these service units to fail and the service associated with the service instance is interrupted. Therefore, there is a need to protect against the impact of hardware failures to ensure service availability.