During the early days of the blossoming of the World Wide Web manifestation of the Internet, there was a one-to-one relationship between Web site and computer. For each Web site, there was a single computer (generally called a “Web server”) that hosted the Web site. The Web site had a single address (called an IP address) and that address was associated with the site's single computer.
The popularity of the Internet has become ubiquitous. Web sites are big business. For many Web sites, a single computer does not serve the volumes of activity that currently takes place and certainly cannot scale to handle the volumes to come. To accommodate these needs, the concept of “load-balancing clusters” was introduced.
Clustering
Clustering is the connecting of two or more computers together in such a way that they behave like a single computer. Clustering is used for parallel processing, for load balancing, and for fault tolerance.
In general, the goal of a cluster is to make it possible to share a computing load over several systems without either the users or system administrators needing to know that more than one system is involved. If any component in the system, hardware or software fails, the user may see degraded performance, but will not lose access to the service. Ideally, if more processing power is needed, the system administrator simply “plugs in a new component”, and presto, the performance of the clustered system as a whole improves.
Load Balancing
As the name implies, “load balancing” attempts to evenly distribute workload amongst several computers. For Web sites, load balancing helps solve the “server-busy” problem that arises when servers drown from a flash flood of users hitting them. Load balancing prevents the problem by keeping track of which server in a group of servers user requests have been routed to and knowing roughly how busy each server is. By approximating that, load balancing determines where to direct the next request.
For example, a company can devote a Web site to the sporting event that it sponsors and use load balancing to handle the crush of hits during the event. Companies find load balancing useful because it is an intelligent and affordable method for apportioning high volumes of requests for server access across multiple machines, be they on the Web or in a data center.
With this technology, server failures are simpler to mask, reducing downtime for the end-user. Previously, managing server failures meant taking hosts out of DNS (Domain Name System) or rebooting them immediately. With load balancing, the failing server can be left in the failed mode for debugging without impacting end-user availability.
Conventional Load-Balancing Clusters
FIG. 1 illustrates a conventional load-balancing cluster 100, which consists of cluster nodes 112a-f. Typically, node 112a-f are nearly identical. Members of a cluster are referred to as nodes or servers. These terms are used interchangeably, herein.
In this conventional load-balancing cluster 100, a node manager 110 serves as the gatekeeper and the proxy for the nodes of the cluster. In the case of a Web site, the node manager 110 hosts the single IP address for the Web site, but it directs users to any one of the nodes 112a-f for service. Other conventional load-balancing clusters employ a partially or fully distributed scheme for managing load-balancing. An example of a fully distributed architecture is the Microsoft® Network Load-Balancing (NLB) cluster architecture. Those of ordinary skill in the art understand the existing architectures of load-balancing schemes.
Typically, the load-balancing cluster 100 balances the load of TCP or UDP traffic. However, end-users do not care about availability at the protocol layer. Rather, end-users care about application-layer availability. A user (such as one on clients 132-138) sends a request for information at the IP address of the cluster 100 via the Internet 120. For an NLB cluster, all hosts receive this request and based on previous “convergence” criteria of the cluster, one host responds. Load is balanced statistically by subdividing the IP: port space of clients among the nodes. In the aggregate, this achieves a balanced load.
Load-balancing clusters provide seamless fault-tolerance in the case of server or network failures. Load-balancing cluster nodes have several specific cluster-states. For example, in NLB, those states are:                Suspended—the node is not active in the cluster. It cannot be made active without an explicit “resume” request. Resume places the node in the Stopped state.        Stopped—the node is not active in the cluster.        Converging—the node is currently becoming active in the cluster. More precisely, all nodes (even those already active) move to this state any time the membership of the cluster changes.        Draining—the node is not receiving new load (e.g., user requests), but existing connections are allowed to complete.        Converged—the node is active in the cluster.        
More generally, load-balancing cluster nodes have these activity-related cluster-states:                Active—the node is active when it is fully participating member of the cluster upon restart of the node. For example in NLB, the desired state upon restart of the node is “converged.”        Inactive—the node is inactive when it is not a participating member of the cluster upon restart of the node. For example in NLB, the desired state upon restart of the node is “Stopped.” Other examples of the inactive state include when a node is stopped or draining.        
Those of ordinary skill in the art understand and appreciate the conventional structure and function of a load-balancing cluster like that illustrated in FIG. 1.
Local and Remote Application-Layer Availability Monitoring
Application-layer refers to the well-known OSI model. Since the application layer is the top layer, any delays at the lower layers ripple up to the application level. In addition, any errors at the lower levels impact the application layer adversely. Thus, monitoring at the application layer gives the true picture of node availability.
Herein, the focus is upon application-layer monitoring as opposed to other kinds of monitoring. An example of application-layer monitoring is performing an http GET for a Web server. An example of another type of monitoring include: checking whether Microsoft® Internet Information Server (IIS) is running as a service under Microsoft® Windows NT®; and collecting performance monitor (perfmon) counters for IIS. To the end-user, application-layer monitoring is superior for determining the actual availability of the service to an end-user.
There are two main ways to monitor application-layer availability of the nodes in a cluster: locally and remotely. Local application-layer monitoring is done from within the cluster. It is performed by the node manager and/or the nodes themselves. For example, if node manager 110 monitored the availability of the nodes 112a-f, then this is local monitoring. This type of monitoring may be called “endocluster” application-layer monitoring.
Remote application-layer monitoring is done from outside the cluster. It is not performed by the node manager and/or the nodes themselves. Rather, it is performed by a computer outside of the cluster, but coupled to the cluster via a network connection. For example, if client 132 monitored the availability of the nodes 112a-f, then this is remote monitoring. This type of monitoring may be called “exocluster” application-layer monitoring. Exocluster application-layer monitoring provides a more accurate measurement of the actual availability of the nodes in the cluster than local monitoring. Why? The ultimate measure of the availability of a node is how it appears to a client from outside the cluster, such as client 132. Therefore, exocluster application-layer monitoring is better because it views node availability from the client's perspective. Herein, this form of monitoring may also be called “client-perspective” application-layer monitoring.
Local application-layer monitoring is not sufficient because the systems are monitoring themselves from their own point of view. The monitoring does not follow the full path through all of the layers of the OSI model to get to the top layer—the application layer. Herein, this form of monitoring (i.e., local application-level) may also be called “cluster-perspective” application-layer monitoring.
Those of ordinary skill in the art are familiar with local and remote monitoring of node availability at the application-layer and are familiar with the advantages and disadvantages of both. Examples of conventional remote application-layer monitoring products include SiteScope® by Freshwater Software®.
Limitations of Conventional Exocluster Application-Layer Monitors
Passive Monitors. Conventional exocluster application-layer monitors are purely passive monitors. They are unable to actively control the nodes that they are monitoring. They cannot stop a problem node. Moreover, they are unable to start an inactive node once its problems have been resolved.
Protocol Specific. Conventional exocluster application-layer monitors are protocol specific. They monitor defined protocols (such as HTTP and SMTP) and are incapable of monitoring other protocols without being reprogrammed.
Static Cluster Membership. Conventional exocluster application-layer monitors monitor a static set of hosts; there is no notion of a cluster. That is, they are not cluster-aware. They are not dynamic. In other words, they cannot dynamically monitor all of the members of the cluster as members are added and removed. They can monitor new members (or stop monitoring old members) once the membership is statically defined specifically for the monitor. However, the conventional exocluster application-layer monitors cannot dynamically begin monitoring new members as they are added to the cluster or dynamically stop monitoring old members as they are removed.