Enterprises depend on the availability of the systems supporting their day to day operation. A system is called available if it is up and running and is producing correct results. Thus, in a narrow sense, availability of a system is the fraction of time it is available. Ideally, the availability of a system is 1.
Availability of a certain system or application has at least two aspects: in a first, narrow significance it relates to the question, whether a certain system is active at all providing its services; in a second, wider significance it relates to the question, whether this service is provided in a timely fashion offering a sufficient responsiveness.
One fundamental mechanism to improve availability is based on xe2x80x9credundancyxe2x80x9d: The availability of hardware is improved by building clusters of machines and the availability of software is improved by running the same software in multiple address spaces. With the advent of distributed systems techniques have been invented which use two or more address spaces on different machines running the same software to improve availability (often called active replication). Further details on these aspects may be found in S. Mullender, xe2x80x9cDistributed Systemsxe2x80x9d, ACM Press, 1993. In using two or more address spaces on the same machine running the same software which gets its request from a shared input queue the technique of warm backups is generalized by the hot pool technique. To improve availability in the above sense of such a multitude of application servers effective workload balancing mechanisms are essential.
Workload management is an area of technology within Transaction Processing monitors (TP monitors). TP monitors have been invented more than three decades ago to make effective use of expensive system resources (J. Gray and A. Reuter, xe2x80x9cTransaction processing: Concepts and Techniquesxe2x80x9d, San Mateo, Calif.: Morgan Kaufmann 1993): Ever increasing numbers of users had to be supported by a system, and it turned out that native operating system functionality did not suffice to allow this. A TP monitor as a layer on top of the operating system manages system resources at a much finer granularity, assigns them with care, only if needed and only for the duration needed. As a result, for one and the same machine and operating system, a given application can support orders of magnitudes more users when implemented in a TP monitor than when implemented based on native operating system features. The very complex and sophisticated TP monitor technology is primarily limited to a certain server only and thus does not solve the availability problem of a distributed network of application servers.
With the advent of distributed systems like for instance commodity cluster environments, i.e. environments which are composed out of relatively cheap hardware (refer for instance to G. F. Pfister, In search of clustersxe2x80x942nd edition, Prentice Hall PTR, 1998), the problem of workload management arose on a larger, distributed scale. In such environments, the service providing software components are simply replicated on multiple machines to ensure scalability. But this requires a mechanism to assign service requests to the various service providers ensuring the effective exploitation of the cluster resources. As a consequence, implementation of the distributed system has to deal with similar problems as traditional TP monitors did before (this is one of the reasons why such systems are considered as xe2x80x9cTP monitor like systemsxe2x80x9d today).
One approach to deal with the problem of workload balancing within a cluster of multiple servers is being based on the idea that all application servers on the different servers share the same input queue. In general, this requires that (a) all application servers run the same message queuing system and (b) the message queuing system supports the remote access to queues. Actually this approach does not represent any workload management decision at all. The common input queue is exploited only for guaranteeing an xe2x80x9catomicxe2x80x9d access to the individual messages (which comprise the application requests); i.e. the first application server which successfully retrieved an application request will be responsible for processing it. The common message queuing system has to spend a significant amount of processing effort for synchronization of the accesses of the various application servers. The larger the network of connected application servers will be, the larger is this synchronization effort which finally will become the processing bottleneck.
In another approach to the problem, when the application client connects to the cluster, it first connects to a particular application server. The application server knows the current workload on the other servers, indicates to the client which application server to use, and the client then connects to the indicated server. This type of workload balancing is rather rudimentary, since it balances only on a user level and not on a request level; moreover this is a static type of workload balancing as it depends on the workload situation of a certain point in time only at logon time. One example of this type of approach is taught by C. R. Gehr et al., xe2x80x9cDynamic Server Switching for Maximum Server Availability and Load Balancingxe2x80x9d, U.S. Pat. No. 5,828,847 according to which the application server to be used by the application client is stored in a profile. Gehr teaches a dynamic server switching system relating to the narrow significance of availability as defined above. The dynamic server switching system maintains a static and predefined list (a kind of profile) in each client which identifies the primary server for that client and the preferred communication method as well as a hierarchy of successively secondary servers and communication method pairs. In the event that the client does not have requests served by the designated primary server or the designated communication method, the system traverses the list to ascertain the identity of the first available alternate server-communication method pair. This system enables a client to redirect requests from an unresponsive server to a predefined alternate server. In this manner, the system provides a reactive server switching for service availability.
In spite of improvements of availability in the narrow sense defined above this teaching suffers from several shortcomings. Gehr""s teaching provides a reactive response only in case a primary server could not be reached at all. There are no proactive elements which prevent that a client requests service from a non-responsive server. As the list of primary and alternate servers is statically predefined there may be situations in which no server could be found at all or in which a server is found not before several non-responsive alternate servers have been tested. Moreover Gehr""s teaching does not allow for a dynamic workload balancing improving the availability in the wider sense, i.e. the responsiveness. According to Gehr, different clients might be controlled by different lists of servers, which allow for a rudimentary and static workload balancing as different clients might send their requests to different servers. In a highly dynamic, worldwide operating network situation where clients and servers permanently enter or leave the network and where the access pattern to the servers may change from one moment to the next, Gehr""s teaching to improve the responsiveness is not adequate.
Despite of all of this progress, further improvements are urgently required supporting enterprises in increasing the availability of their applications and allowing for instance for electronic business on a 7 (days) * 24 (hour) basis; due to the ubiquity of worldwide computer networks at any point in time somebody might have interest in accessing a certain application server.
The invention is based on the object of providing an improved method and means for workload management within a multitude of application servers providing services to a multitude of application clients.
It is a further object of the invention to increase the availability by providing a technology, which is highly responsive to dynamic changes of the workload of individual application servers within the network.
The objects of the invention are solved by the independent claims. Further advantageous arrangements and embodiments of the invention are set forth in the respective subclaims.
The invention relates to a method and system of workload balancing for a multitude of applications servers which comprises a first-step, wherein an application client sends an application request to a request queue of a dispatcher. In a second step the dispatcher is extracting an application request from request queue. Within the second step the dispatcher selects a certain one of the application servers to which the extracted application request is to be sent based on a table. The table is administered by the dispatcher and the table comprises an indication of the workload of the application servers. Also within the second step the dispatcher administers the table according to the selection and the dispatcher sends the extracted application request to a certain application server. The method comprises a third step, wherein an application server after processing an application request returns to the dispatcher a response comprising an indication of its current workload. In a fourth step the dispatcher is administering the table according to the response.
The proposed technology improves the workload balancing within a multitude of application servers providing services to a multitude of application clients. At the same time the availability of the application servers is improved. Through administration of a table storing indications of the workload of the application servers a dynamic technique and ongoing process is suggested being highly responsive to dynamic network situation where clients permanently enter or leave the network and where the access pattern to the servers may change from one moment to the next. Complicated or due to its sheer complexity, monumentous administration efforts to associate application clients with application servers are completely avoided. By introducing a dispatcher and teaching that the dispatcher is executing the load-balancing decisions, a significant processing burden is removed from the servers, where according to the state of the art the workload balancing decisions would be performed and which typically build the primary bottleneck for processing resources. Moreover the application clients, which in most cases are short on resources, have not be involved in the workload balancing processing.