The advancement in the information technology is changing the way services are being delivered, and is introducing new business models based on broadband internet access, e.g., VoIP (voice over the internet protocol). The services provided by the applications are considered highly available if they are accessible 99.999% of the time (also known as five 9's). High Availability (HA) has become a key requirement for critical applications and revenue generating applications. The Service Availability Forum (also referred to as the “SA Forum” or “SAF”), a consortium of telecom and computing companies, has defined a set of specifications that describe a middleware that manages the HA of the applications. The middleware requires a configuration that describes each of the applications it manages. Moreover, the applications that wish to interact with the middleware (e.g., to checkpoint their state) need to implement the SA Forum Application Programming Interface (API). This process requires deep knowledge of the field from the application developers, as well as from the system integrator that needs to define a complex configuration.
More specifically, the HA middleware defined by the SA Forum comprises a core that is based on the Availability Management Framework (AMF). AMF is responsible for maintaining the service availability by detecting and reacting to failures. AMF performs the availability management according to a system configuration known as the AMF configuration. The AMF configuration is a logical organization of the software components describing their dependencies, the services they provide, the recovery policy that AMF must apply in case of failure, how they are grouped, etc.
In the following, the main elements of the AMF configuration will be described. The basic building block of the AMF configuration is an AMF component (also referred to as a component), which abstracts a deployable instance of an application's component. The service(s) provided by such a component is represented by a component-service-instance. The components that collaborate closely and that must be collocated to provide a more integrated service are grouped into a service-unit. The workload assigned to the service-unit is referred to as the service-instance, which is a grouping of component-service-instances. The service-units composed of redundant component replicas form a service-group. The service availability management takes place within the service-group, i.e., the service instances are provided by service-units and protected against failures within the scope of the service-group. The AMF configuration also represents the nodes on which the components are deployed. AMF supports the notion of a redundancy model for a service-group. The redundancy model defines the redundancy scheme according to which the service-instances are protected. For instance a 2N redundancy indicates that the service-group can have one active service-unit for all the service-instances and one standby for all the service-instances. For example, a service-unit cannot simultaneously be active for some service-instances and standby for others.
FIG. 1 illustrates an example AMF configuration, in which there is one service-group (SG1) with 2N redundancy. The service-group (SG1) contains two redundant service-units (SU1 and SU2), deployed on nodes Node1 and Node2, respectively. One service-unit is active and one is in standby, ready to take over if the active one fails. Each service-unit (SU1 or SU2) has two components ((C1 and C2) for SU1; (C3 and C4) for SU2). The service-group (SG1) protects two service-instances (SI1 and SI2). Each service-instance is composed of two component-service-instances ((CSI1 and CSI2) for SI1; (CSI3 and CSI4) for SI2). The AMF configuration also contains attributes that can determine the recovery executed by AMF at runtime in case a failure of a component or a service-instance is detected. All of the elements shown in FIG. 1 are represented by objects in the AMF configuration. The structure of these objects has to comply with a Unified Modeling Language (UML) class diagram. The configuration objects are described according to a standardized machine-readable eXtensible Markup Language (XML) schema. It is the responsibility of a system integrator to define the AMF configuration.
A distinction between a component and an application is explained in the following. From a software engineering perspective, an application (more specifically, the executable code of an application) can have one or more application components. From an HA perspective, each deployment (i.e., installation) of an application component is considered a distinct component. For example, a given application such as a database can be considered as one application composed of one application component (i.e., the database). However, if the database is replicated on three nodes (for redundancy), then this application is considered to have three components from an HA perspective, and thus the AMF middleware configuration would include the description of three distinct components. In the context described herein, a user describes an application from a software engineering perspective; however, once deployed and managed by the middleware, the application is viewed from an HA perspective.
An example of the services offered by the HA middleware is a checkpoint service. The checkpoint service allows the components at runtime to create checkpoint objects that can store data representing the application state. Once a checkpoint object is created, the checkpoint service makes sure that the checkpoint object is properly replicated within the cluster/computing system to avoid losing the state information in case of failure. The checkpoint service offers various modes of synchronizations between the replicated checkpoint objects (e.g., synchronous and asynchronous). One main objective behind the checkpoint service is to allow an application to have service continuity, by conserving its state in case of the application failure.
In order for a software component to interact with AMF, or other middleware services such as the checkpoint, it needs to implement the service specific APIs defined by the SA Forum specifications, i.e., the API implementation is incorporated in the code of the software component. This requires the application developers to have detailed knowledge of the APIs and the AMF architecture.
A number of approaches for managing the high availability of a software application have been proposed. The approaches generally fall into three categories. In the first category, the middleware does not offer checkpoint service to the application; thus, the application state cannot be preserved by the middleware. In the second category, the application implements the APIs required by the middleware; thus, detailed knowledge of the SA Forum specifications and the APIs is needed. In the third category, the middleware controls and communicates with the application via a proxy; thus, if the proxy fails, the middleware loses the means of communication with the application. All of these approaches have drawbacks in providing high availability for applications that need to preserve their state. Therefore, there is still a need to improve the management of high availability and service continuity of software applications.