The Service Availability Forum (SA Forum) is a consortium of industry-leading companies promoting a set of open specifications that enables the creation and deployment of highly available and mission critical services. As a standardization body, the SA Forum has defined a set of open specifications.
The Service Availability Forum (SAForum) defined a set of specifications for a High Availability (HA) middleware. The core of this middleware is the Availability Management Framework (AMF) (see, SA Forum, Application Interface Specification, Availability Management Framework SAI-AIS-AMF-B.04.01). The AMF is responsible for monitoring the components, detecting failures, and reacting to failures. The AMF performs the availability management according to a system configuration known as the AMF configuration. The AMF configuration is a logical organization of the software components describing how they are grouped, their dependencies, the services they provide, and the recovery policy that the AMF must apply in case of failure.
In an AMF configuration, the basic building block of the AMF configuration is the component, which abstracts a deployable instance of software/hardware resource. The service provided by such a component is represented by a component-service-instance. The components that collaborate closely and that must be collocated to provide a more integrated service are grouped into a service-unit. The workload assigned to the service-unit is referred to as the service-instance, which is a group of component-service-instances. The service-units composed of redundant component replicas form a service-group. The service availability management takes place within the service-group; i.e., the service instances are provided by service-units and protected against failures within the scope of the service-group. The AMF configuration also represents the nodes on which the components are deployed.
A system integrator is responsible for dimensioning the scope of those units and groups, and to define the recovery policies that are deemed most suitable for ensuring the service availability. These different entities correspond to increasing fault zones, where each fault zone is a scope that can be isolated and repaired to recover from a fault.
The AMF supports the notion of a redundancy model for a service-group. The redundancy model defines the redundancy scheme according to which the service-instances are protected. For instance, a 2N redundancy dictates that the service-group can have one active service-unit for all of the service-instances and one standby for all of the service-instances; i.e., a service-unit cannot simultaneously be active for some service-instances and standby for others. On the other hand, an N-way-active redundancy model allows for multiple active (but no standby) service-units in the service-group even for the same service-instance.
The AMF manages the high availability of services provided by software components according to the values of AMF configuration attributes. Some of the attributes are associated with protection and recovery policies. These policies specify the number of components assigned active/standby roles on behalf of a component-service-instance, and restrictions on standard recoveries. These attributes can be configured by the system integrator at the configuration time. Among these attributes is a component_disable_restart (also referred to as the “disable_restart attribute”), which has a Boolean value that specifies whether a component restart is a desirable recovery in case of failure. The AMF specification recommends that this attribute should be set to true if the component failover is faster than its restart, and otherwise it should be set to false.