1. Field of Invention
The present invention relates generally to the field of network server clusters and pertains more particularly to a system for and a method of empirically measuring the capacity of multiple servers in a cluster and forwarding the relative weights for each server to a load balancer for the cluster.
2. Discussion of the Prior Art
Since at least as early as the invention of the first computer system, people have endeavored to link computers together. Linking computers together in a network allows direct communication between the various points of the network and sharing of resources, such as servers, among the points of the network. This was especially desirable early on because computers were very large and very expensive. Today, computers are far more prevalent, but the desire to link them together is still strong. This is most readily demonstrated by the explosive growth of the Internet.
Servers are commonly employed for sharing of information among large numbers of computer systems or similar devices. A computer system that communicates with a server is usually referred to as a client of the server and the server is often part of a host system. A client and a host typically exchange messages via a communication network using a predetermined protocol. Such protocols are usually arranged in a client/host model in which a requesting client transfers a request message to a host and the host in turn takes an appropriate action depending on the content of the request message. Typically, the appropriate action for the request message includes the transfer of a response message to the requesting client.
Current protocols typically do not allow for the establishment of a persistent session between the client and the host in the traditional sense in which a local terminal establishes a session on a computer system. Instead, any session-like information is usually implied in the content of the messages exchanged between the client and the host. Such a communication protocol may be referred to as a xe2x80x9cstatelessxe2x80x9d protocol. Such stateless protocols include protocols associated with Internet communication including the Internet Protocol (IP), the User Datagram Protocol (UDP), the Simple Mail Transfer Protocol (SMTP), and the Hypertext Transfer Protocol (HTTP), as well as the Network File System (NFS) Protocol.
A client that accesses a host commonly engages in an extended transaction with the host. Such an extended transaction typically involves the exchange of multiple messages between the client and the host. For example, an NFS client typically issues multiple request messages to an NFS server while retrieving a file from the NFS server. Similarly, an HTTP client typically issues multiple request messages to an HTTP server while browsing through web pages contained on the HTTP server. Such transactions that involve the exchange of multiple messages between a client and a server are hereinafter referred to as sessions.
Servers commonly have a large pool of potential clients which may issue request messages. For example, an HTTP server connected to the world-wide-web has potentially millions of clients from which it may receive request messages. Current servers that are adapted for stateless protocols typically respond to each request message in the order in which it is received, that is, on a first-come-first-served basis regardless of the source of the request message.
In the present context, the term xe2x80x9cquality of servicexe2x80x9d refers both a host""s ability to provide quick response to a message and to complete an entire session. As a particular host becomes more popular, and due to that popularity receives more messages, the processing resources of the host can become stretched. For example, due to heavy traffic, a host may not be able to respond to a message at all, or the host may not provide a timely response which can cause a client to xe2x80x9ctime-outxe2x80x9d and generate an error. Poor quality of service can have significant results, as users may become frustrated and simply give up trying to reach a particular host, or the sponsor of the host may lose sales or fail to communicate needed information to any or all clients.
One technique that is generally used to alleviate quality of service problems is to add more processing capacity to the host. This can be done typically by either replacing the host with another, more powerful computer, or by providing multiple computers in parallel as a server cluster and delegating new messages to different ones of the multiple servers. When multiple servers are used in a cluster a load balancer is used to allocate the demand to the various servers. Demand is allocated by assigning each server a value called a relative weight. The relative weight determines what proportion of the traffic each server in the cluster carries. Generally, it is the case that the higher the relative weight then the higher the load. When the cluster is made up of identical servers, then the relative weights are typically equal to one another. This is because the servers should theoretically be able to handle equal loads.
When the cluster is made up of different servers, then the situation becomes more complicated. The use of a heterogeneous cluster of servers is common because demand on the host grows over time. The problem is that, when one goes to add a server to the cluster, the state of the art in computers has changed. Either one can no longer get the same computer as before or one chooses not to. Few can afford to replace the entire cluster just to add one new computer. The result is that different servers are used. The complication lies in determining what relative weights to assign the servers in the cluster. Conventionally, two techniques have been used. First, the servers are theoretically modeled based on design parameters and the relative weights are calculated based on the models. Second, the relative weights are determined based on ad hoc cluster operation. This means that the relative weights are set to an initial value and the cluster is put into operation. Later, if and when the operator of the cluster notices that a problem exists, then the relative weights are adjusted based on an educated guess of what would help to alleviate the problem. Neither technique is ideal.
A definite need exists for a system having an ability to empirically measure the capacity of multiple servers and forward the relative weights to the load balancer. In particular, a need exists for a system which is capable of performing a capacity test on each server in the cluster. Ideally, such a system would operate by measuring a server""s load capacity and assigning a relative weight accordingly. With a system of this type, load balancing would provide a reliable means of fairly sharing network resources. A primary purpose of the present invention is to solve this need and provide further, related advantages.
A system for and a method of empirically measuring the capacity of multiple servers in a cluster is disclosed including at least one capacity prober for measuring the load capacity of each server and for forwarding the relative weights to a load balancer for the cluster. During off peak operating hours, one at a time each server in the cluster is taken off line and stress tested to measure the capacity of the server. The remaining servers in the cluster remain on line to service customers. The relative weights for each of the servers are collected and updated in the load balancer. In this way, the operation of the cluster is better optimized.