1. Field of the Invention
The present invention is directed to the optimization of dynamic placement of computing applications on servers to satisfy all the applications demand while changing the assignment of applications as little as possible.
2. Background Description
With the proliferation of the World Wide Web (WWW or simply the “Web”) and outsourcing of data services, computing service centers have increased in both size and complexity. Such centers provide a variety of services; for example, Web content hosting, e-commerce, Web applications, and business applications. Managing such centers is challenging since a service provider must manage the quality of service provided to competing applications in the face of unpredictable load intensity and distribution among the various offered services and applications. Several management software packages which deal with these operational management issues have been introduced. These software systems provide functions including monitoring, demand estimation, load balancing, dynamic provisioning, service differentiation, optimized resource allocation, and dynamic application placement. The last function, namely dynamic application placement, is the subject of this invention.
Service requests are satisfied through the execution of one or more instances of each of a set of applications. Applications include access to static and dynamic Web content, enterprise applications, and access to database servers. Applications may be provided by HTTP (Hypertext Transfer Protocol) Web servers, servlets, Enterprise Java Beans (EJB), or database queries. When the number of service requests for a particular application increases, the application placement management software deploys additional instances of the application in order to accommodate the increased load. It is imperative to have an on-demand management environment allowing instances of applications to be dynamically deployed and removed. The problem is to dynamically change the number of application instances so as to satisfy the dynamic load while minimizing the overhead of starting and stopping application instances.
We characterize an application by two types of parameters: (1) load-independent requirements of resources required to run an application, and (2) load-dependent requirements which are a function of the external load or demand placed on the application. Examples of load-independent requirements are memory, communication channels, and storage. Examples of load-dependent requirements are current or projected request rate, CPU (Central Processing Unit) cycles, disk activity, and number of execution threads.
We also characterize a server by two parameters: (1) a load-independent capacity which represents the amount of resources available to host applications on the server, and (2) a load-dependent capacity which represents the available capacity to process requests for the applications' services.
This invention addresses the problem of automatic instantiation of application processes in a server farm to allow the server farm to dynamically adjust the number of application processes as the load for the server processes fluctuates. Each server machine can run some number of application processes. The use of these applications processes is through request messages, to which there may be replies. The collection of servers is known as a cluster. A server machine can run only a limited number of application processes. Request messages for a particular application are split among all instances of that application. Therefore, when application instances use different servers, the size of a cluster directly impacts the amount of load that the cluster can sustain without performance degradation.
When the size of a cluster is insufficient, the application users experience performance degradation or failures, resulting in the violation of Service Level Agreements (SLA). Today, to avoid SLA violation, application providers must overprovision the number of application instances to handle peak load. This results in poor resource utilization during normal operation conditions. Dynamic allocation alleviates the problem of wasted capacity by automatically reallocating servers among applications based on their current load and SLA objectives.
Dynamic allocation techniques available today (e.g., IBM Tivoli Intelligent ThinkDynamics Orchestrator), assign applications to server clusters. Then, servers are reallocated among clusters based on the offered load.
These techniques have several limitations:
(1) When only one application can be assigned to a cluster at any given time, the granularity of resource allocation is coarse. The approach is wasteful when an application demand is not sufficient to utilize an entire server.
(2) When more than one application can be assigned to a cluster, all applications in the cluster must be running concurrently. This limits the number of applications assigned to a cluster by the memory capacity of the smallest server in the cluster. This results in wasted server capacity, as an application must execute on all servers in the cluster even if its workload could be satisfied by a subset of the servers in the cluster. Typically, only a limited number of applications can be executed on a server at a time and therefore regardless of the number of servers in the cluster, only a few applications can be served by the cluster.
(3) In the process of server reallocation from one application to another, the old application has to be uninstalled, the server reconfigured, and the new application has to be installed. Usually, network configuration also needs to change. This reconfiguration process may be time-consuming and therefore cannot be performed frequently, which results in lower responsiveness to workload changes.
The problem of optimally placing replicas of objects on servers, constrained by object and server sizes as well as capacity to satisfy a fluctuating demand for objects, has appeared in a number of fields related to distributed computing. In managing video-on-demand systems, replicas of movies are placed on storage devices and streamed by video servers to a dynamic set of clients with a highly skewed movie selection distribution. The goal is to maximize the number of admitted video stream requests. Several movie placement and video stream migration policies have been studied. A disk load balancing criterion which combines a static component and a dynamic component is described by J. L. Wolf, P. S. Yu, and H. Shachnai in “Disk load balancing for video-on-demand systems”, ACM/Springer Multimedia Systems Journal, 5(6):358-370, 1997. The static component decides the number of copies needed for each movie by first solving an apportionment problem and then solving the problem of heuristically assigning the copies onto storage groups to limit the number of assignment changes. The dynamic component solves a discrete class-constrained resource allocation problem for optimal load balancing, and then introduces an algorithm for dynamically shifting the load among servers (i.e., migrating existing video streams).
Similar problems have been studied in theoretical optimization literature. The special case of our problem with uniform memory requirements was studied by H. Schachnai and T. Tamir in “On two class-constrained versions of the multiple knapsack problem”, Algorithmica 29 (2001), 442-467, and H. Schachnai, T. Tamir, in “Noah Bagels: Some Combinatorial Aspects”, International Conference on FUN with Algorithms (FUN), Isola d'Elba, June 1998, where some approximation algorithms were suggested. Related optimization problems include bin packing, multiple knapsack and multi-dimensional knapsack.