Recently, the cloud has become the lifeblood of many telecommunication network services and information technology (IT) software applications. With the development of the cloud market, cloud computing can be seen as an opportunity for information and communications technology (ICT) companies to deliver communication and IT services over any fixed or mobile network, high performance and secure end-to-end quality of service (QoS) for end users. Although cloud computing provides benefits to different players in its ecosystem and makes services available anytime, anywhere and in any context, other concerns arise regarding the performance and the quality of services offered by the cloud.
One area of concern is the High Availability (HA) of the applications hosted in the cloud. Since these applications are hosted by virtual machines (VMs) residing on physical servers, their availability depends on that of the hosting servers. When a hosting server fails, its VMs, as well as their applications become inoperative.
The Service Availability Forum (SAForum), a consortium of telecommunication and IT companies, has created standards for high availability systems. The SAForum has defined standards to leverage HA systems on commercial off-the-shelf (COTS) equipment. Enabling HA systems on standard IT platforms of different architectures such as x86, ARM, and ATCA maintains the portability and interoperability of HA application across various standard compliant platforms. More specifically, the SAForum defines standards and guidelines for the design of an HA middleware that manages the availability of the services provided by an application. It aims to achieve the desired application's availability through the management of redundant components and by seamlessly swapping a faulty component workload to a redundant component upon detecting a failure.
The SAForum middleware provides several services including the availability management framework (AMF) responsible for monitoring the application's components and orchestrating their recoveries, and the software management framework (SMF) responsible for carrying software upgrades supporting the automated rolling upgrade that allows the incremental upgrade of the applications components. Also, it minimizes the downtime by synchronizing with the AMF. The AMF can leverage the redundant replicas of a given component by dynamically switching over the workloads to the upgraded replicas while the old-versioned replica is being upgraded. The applications that integrate with the SAForum middleware can also benefit from other services such as distributed messaging, checkpointing, logging and other services. The OpenSAF project is an open source HA middleware implementation of the SAForum standards.
The conventional HA middleware was not developed for the cloud environment, but rather for static deployments within a data center.
The promise of having a simplified IT infrastructure and an on-demand provisioning model is a key feature that enabled the adoption of cloud computing by the enterprise. From the perspective of a cloud provider that offers infrastructure as a service (IaaS), elasticity can be considered both a cloud feature and a service. Elasticity is a cloud feature in that it allows the cloud itself to absorb the addition or removal of physical resources in a transparent manner. Elasticity is a cloud service offered to the cloud tenants that allows the virtual resources allocated to their applications to grow and shrink in proportion to the runtime demand. On the other hand, from a cloud tenant perspective, the elasticity service offered by the provider becomes a feature of their cloud deployed application(s). FIG. 1 illustrates the different perspectives of a cloud tenant vs a cloud provider.
Another factor that is often neglected in elastic cloud deployments is the dynamic HA-aware scheduling for the addition and removal of the VMs hosting the application's components. Deploying replicated components in different servers, racks, data-centers can protect against larger failure scopes, however it should also take into consideration the functional (e.g. colocation dependencies for shared libraries, delay tolerance among dependent components, etc.) and non-functional requirements such as HA.
A comprehensive elasticity solution should consider the HA-aware scheduling of any added/removed VMs, the dynamic deployment of the middleware managing the availability of the applications, and the runtime addition/removal of the application instances without service interruption.
Therefore, it would be desirable to provide a system and method that obviate or mitigate the above described problems.