The present invention relates generally to the field of cloud computing and, in particular to high availability for orchestrating cloud services.
A cloud computing environment enables network access to a shared pool of configurable computer resources. The computing resources are typically in a virtualized environment where a user can access, on-demand, multiple virtual machines. Under this framework, users do not have to purchase, maintain, and operate infrastructure on site; rather purchase from a service provider the right to use such equipment.
Orchestrating cloud services involves the automating of various tasks involved in coordinating, organizing and managing software, services, and/or hardware. Orchestration may define various policies and service levels. For instance, orchestrating cloud services manages cross system computing functions, by arranging and coordinating tasks automatically. Orchestration manages the network infrastructure maintained by a service provider upon which developed web applications can be deployed. For example, a cloud based computing framework, can be designed to manage the logistics and orchestration of the environment by supporting development, running and the management of applications, allowing application developers to focus on software development rather than a systems infrastructure as well.
Cloud services may often deliver hundreds of services and API's which often cause services build on such platforms to become very complex. A common problem may arise as the more complex a system is, the high the potential risk of an error forming.
Additionally, cloud services are expected to serve users with minimal to no downtime. If an orchestrated cloud service crashes while processing a request, in order to maintain the service availability, providers often restart the service and notify the user (often the application developer) to manual troubleshooting, as each service within a chain is independent. Restarting a broken service may not resolve the issue if the crash may be caused by a recent change of the service (i.e., a service version update). Additionally, restarting a broken service may not resolve the issue if a change of other services that came before the service chain, as restarting the crashed service would only lead to endless crashes at that point, thereby making the orchestrated services unavailable.