Cloud hosting of applications has become a cost-effective choice for developers to run applications, in particular, web-based applications and services. Cloud hosting infrastructures such as Microsoft's Windows Azure Platform, GoGrid, ElasticHosts, Mosso, and Amazon's Elastic Compute Cloud are just a few examples of cloud and utility computing platforms. By leveraging virtualization, economies of scale, resource time-sharing, and on-demand allocation of servers to different services (i.e., dynamic growing and shrinking of hosted application instances), cloud computing infrastructures provide cost-effective, fast deployment, and flexible alternatives to host services, in contrast to dedicated IT clusters. However, these infrastructures or platforms introduce new challenges for service developers and cloud operators. Developers run their applications on servers and networks they cannot directly observe or control and operators host black-box applications developed by external entities that might not be trusted. As a result, it is often difficult for both developers and cloud administrators to determine if application runtime errors are due to software bugs, inadequate resources available to applications, or platform outages, etc.
To elaborate, it is difficult for users and developers to observe the execution of their applications and check for safety (i.e., correctness of application state and operations) and liveness conditions (i.e., a concurrent application's ability to execute in a timely manner), at runtime. For example, it may not be possible, with existing cloud platforms, for application developers to identify software bugs and vulnerabilities (e.g., memory leaks and zombie processes), to reduce overhead (e.g., CPU and bandwidth), to maintain service availability, to improve performance, etc., without observing the runtime execution of their applications on these platforms. It has also been difficult for developers to safeguard application performance and availability against problems in the hosting platform such as misconfigured servers, network outages, or lack of sufficient resources available to applications during their execution on the cloud platform. Cloud operators also have difficulties. Operators may not be able to ensure that hosted applications are allocated sufficient resources to meet their specified Service-Level Agreement (SLA), that they do not interfere with other applications sharing common resources such as memory bandwidth and network, and that customer applications, either inadvertently or maliciously, don't abuse the hosting infrastructure (e.g., application instances acting as botnets for sending SIP attacks, spam or sending distributed denial-of-service (DDoS) attacks to internal or external sites), among other things.
Currently, there is a lack of adequate solutions for these challenges. Developers may take the approach of building cloud applications by programming the behavior of individual nodes at a low-level while attempting to achieve high-level global properties of the application. For debugging, developers may simply print to log files, which may allow them to observe local state and behavior at individual nodes, but which may not allow them to check global behaviors of the application and cloud computing infrastructure such as those relating to load balancing and fault tolerance. Furthermore, debugging is used for offline analysis and may not provide insight or control over global properties of the application and the cloud computing infrastructure, which may need to be continuously evaluated and enforced as an application executes on the cloud platform.
Some cloud platforms monitor performance counters at servers and log the counters to a database for post-mortem analysis. Watchdog processes may be installed on internal and external sites to periodically check the availability of individual application instances in a cloud. Management systems may provide automatic scaling of applications based on input workloads but do not provide techniques to protect the infrastructure from misbehaving applications. As a result, these approaches may be prone to errors, may exhibit delayed response to critical events, and may not guarantee desired performance and availability of hosted applications as well as of the cloud platform, among other factors.
Techniques related to managing cloud hosted applications are discussed below.