The advancement of computer networking has enabled computer programs to evolve from the early days' monolithic form that is used by one user at a time into distributed applications. A distributed application, running on two or more networked computers, is able to support multiple users at the same time. FIG. 1 shows the basic structure of a distributed application in a client-server architecture. The clients 100 send requests 110 via the network 140 to the server 150, and the server 150 sends responses 120 back to the clients 100 via the network 140. The same server is able to serve multiple concurrent clients.
Today, most applications are distributed. FIG. 2 shows the architecture of a typical web application. The client part of a web application runs inside a web browser 210 that interacts with the user. The server part of a web application runs on one or multiple computers, such as Web Server 250, Application Server 260, and Database Server 280. The server components typically reside in an infrastructure referred to as “host infrastructure” or “application infrastructure” 245.
In order for a web application to be able to serve a large number of clients, its host infrastructure must meet performance, scalability and availability requirements. “Performance” refers to the application's responsiveness to user interactions. “Scalability” refers to an application's capability to perform under increased load demand. “Availability” refers to an application's capability to deliver continuous, uninterrupted service. With the exponential growth of the number of Internet users, access demand can easily overwhelm the capacity of a single server computer.
An effective way to address performance, scalability and availability concerns is to host a web application on multiple servers (server clustering) and load balance client requests among these servers (or sites). Load balancing spreads the load among multiple servers. If one server failed, the load balancing mechanism would direct traffic away from the failed server so that the site is still operational. FIG. 3 is an illustration of using multiple web servers, multiple application servers and multiple database servers to increase the capacity of the web application. Clustering is frequently used today for improving application scalability.
Another way for addressing performance, scalability and availability concerns is to replicate the entire application to two different data centers (site mirroring). Site mirroring is a more advanced approach than server clustering because it replicates an entire application, including documents, code, data, web server software, application server software, database server software, to another geographic location, thereby creating two geographically separated sites mirroring each other. A hardware device called “Global Load Balancing Device” performs load balancing among the multiple sites.
For both server clustering and site mirroring, a variety of load balancing mechanisms have been developed. They all work fine in their specific context.
However, both server clustering and site mirroring have significant limitations. Both approaches provision a “fixed” amount of infrastructure capacity, while the load on a web application is not fixed. In reality, there is no “right” amount of infrastructure capacity to provision for a web application because the load on the application can swing from zero to millions of hits within a short period of time when there is a traffic spike. When under-provisioned, the application may perform poorly or even become unavailable. When over-provisioned, the over-provisioned capacity is wasted. To be conservative, a lot of web operators end up purchasing significantly more capacity than needed. It is common to see server utilization below 20% in a lot of data centers today, resulting in substantial capacity waste. Yet the application still goes under when traffic spikes happen. This is called as a “capacity dilemma” that happens every day. Furthermore, these traditional techniques are time consuming and expensive to set up and are equally time consuming and expensive to make changes. Events like natural disaster can cause an entire site to fail. Comparing to server clustering, site mirroring provides availability even if one site completely failed. However, it is more complex to set up and requires data synchronization between the two sites. Lastly, the set of global load balancing devices is a single point of failure.
A third approach for improving web performance is to use a Content Delivery Network (CDN) service. Companies like Akamai and Limelight Networks operate a global content delivery infrastructure comprising of tens of thousands of servers strategically placed across the globe. These servers cache web content (static documents) produced by their customers (content providers). When a user requests such content, a routing mechanism (typically based on Domain Name Server (DNS) techniques) would find an appropriate caching server to serve the request. By using content delivery service, users receive better content performance because content is delivered from an edge server that is closer to the user.
Though content delivery networks can enhance performance and scalability, they are limited to static content. Web applications are dynamic. Responses dynamically generated from web applications can not be cached. Web application scalability is still limited by its hosting infrastructure capacity. Further, CDN services do not enhance availability for web applications in general. If the hosting infrastructure goes down, the application will not be available. So though CDN services help improve performance and scalability in serving static content, they do not change the fact that the site's scalability and availability are limited by the site's infrastructure capacity.
Over the recent years, cloud computing has emerged as an efficient and more flexible way to do computing, shown in FIG. 4. According to Wikipedia, cloud computing “refers to the use of Internet-based (i.e. Cloud) computer technology for a variety of services. It is a style of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet. Users need not have knowledge of, expertise in, or control over the technology infrastructure ‘in the cloud’ that supports them”. The word “cloud” is a metaphor, based on how it is depicted in computer network diagrams, and is an abstraction for the complex infrastructure it conceals. In this document, we use the term “Cloud Computing” to refer to the utilization of a network-based computing infrastructure that includes many inter-connected computing nodes to provide a certain type of service, of which each node may employ technologies like virtualization and web services. The internal works of the cloud itself are concealed from the user point of view.
One of the enablers for cloud computing is virtualization. Wikipedia explains that “virtualization is a broad term that refers to the abstraction of computer resource”. It includes “Platform virtualization, which separates an operating system from the underlying platform resources”, “Resource virtualization, the virtualization of specific system resources, such as storage volumes, name spaces, and network resource” and so on. VMWare is a highly successful company that provides virtualization software to “virtualize” computer operating systems from the underlying hardware resources. Due to virtualization, one can use software to start, stop and manage “virtual machine” (VM) nodes 460, 470 in a computing environment 450, shown in FIG. 4. Each “virtual machine” behaves just like a regular computer from an external point of view. One can install software onto it, delete files from it and run programs on it, though the “virtual machine” itself is just a software program running on a “real” computer.
Another enabler for cloud computing is the availability of commodity hardware as well as the computing power of commodity hardware. For a few hundred dollars, one can acquire a computer that is more powerful than a machine that would have cost ten times more twenty years ago. Though an individual commodity machine itself may not be reliable, putting many of them together can produce an extremely reliable and powerful system. Amazon.com's Elastic Computing Cloud (EC2) is an example of a cloud computing environment that employs thousands of commodity machines with virtualization software to form an extremely powerful computing infrastructure.
By utilizing commodity hardware and virtualization, cloud computing can increase data center efficiency, enhance operational flexibility and reduce costs. Running a web application in a cloud environment has the potential to efficiently meet performance, scalability and availability objectives. For example, when there is a traffic increase that exceeded the current capacity, one can launch new server nodes to handle the increased traffic. If the current capacity exceeds the traffic demand by a certain threshold, one can shut down some of the server nodes to lower resource consumption. If some existing server nodes failed, one can launch new nodes and redirect traffic to the new nodes.
However, running web applications in a cloud computing environment like Amazon EC2 creates new requirements for traffic management and load balancing because of the frequent node stopping and starting. In the cases of server clustering and site mirroring, stopping a server or server failure are exceptions. The corresponding load balancing mechanisms are also designed to handle such occurrences as exceptions. In a cloud computing environment, server reboot and server shutdown are assumed to be common occurrences rather than exceptions. On one side, the assumption that individual nodes are not reliable is at the center of design for a cloud system due to its utilization of commodity hardware. On the other side, there are business reasons to start or stop nodes in order to increase resource utilization and reduce costs. Naturally, the traffic management and load balancing system required for a cloud computing environment must be responsive to node status changes.
Thus it would be advantageous to provide a cloud management system that can automatically scale up and scale down infrastructure capacity in response to an application's load demand, intelligently direct traffic to a plurality of server nodes in response to node status changes and load condition changes, while enhancing an application's performance, scalability and availability.