An optimization problem models the dynamic placement of remotely-hosted operations (RHO) of software applications on servers under conditions of heavy simultaneous resource requirements. Some resource requirements are dependent on the loads associated with the software applications and their users' actions while other resource requirements are independent of moment-to-moment loads. The demand (load) for applications changes over time and the goal is to satisfy all the demand while changing the solution (assignment of applications to servers) as little as possible from period to period. Dynamically partitioning a set of resources among a plurality of client organization licensees and their respective users is desirable and valuable, to provide economies-of-scale through resource-sharing in such a manner that the total cost of ownership (TCO) is as low as possible for the client licensees whose business activity is hosted on such RHO systems, communications switches, and server farms.
Selection of optimal client-sets such that the sets optimally utilize the available capacity of the RHO server resources allocated to each set is related to the problems of data-compression and digital steganography. In the former problem, the aim is to losslessly encode the information into the minimum number of digital bits as possible. In the latter problem, the aim is to reversibly obscure one or more confidential digital signatures or patterns of information by encoding their admixture with other host information or digital image as an intermediate-complexity composite image. The identification of optimal client-sets that yield efficient, balanced load assignments on the minimum server resources is an analogous process of discovering admixtures of load that effectively obscure each individual client's load timeseries by immersion in the pool of activity of all clients in the client-set.
Such long timescale load-balancing is not only desirable for improving server farm availability, scalability, manageability, and security. It is also desirable for improving cost-efficiency and affordability, by minimizing TCO through large-scale resource-sharing. While state-less load-balancing is simple, ‘state-aware’ load-balancing is far more powerful for preventing overloads while at the same time maximizing the utilization of allocated resources.
Large-scale ‘grid’ computing is capable of delivering reduced costs through sharing of resources, spreading the use of large-capacity resources among many client organizations. Those practiced in the art currently identify candidate aggregate sets whose loads are ‘complementary’ to each other and who thereby, in aggregate, present modest loads to the computing resources that are assigned to the sets in an ad hoc fashion. Despite the fact that there are a variety of algorithms and systems for short timescale load-balancing of generic application loads presented to ASP server farms, to date there have been no methods for: consistently, predictably, and reliably identifying aggregate load-sets that have optimal resource utilization and performance properties on a longer timescale of days to months; nor for rank-ordering alternative aggregate sets of clients and applications that are nonfungible (nongeneric, on account of differing business-rules requirements) according to a numerical figure-of-merit valid for long timescales, for the purpose of making optimal assignments to dedicated resource-sets.
Combinations or aggregate loads whose peaks in CPU or memory or I/O read-write rates are additive and occur at the same times and days-of-week should be avoided, insofar as inter-client contention for the finite resources will make the amount of resources needed to deliver a given level of performance according to contract worse-than-linearly greater (as an aggregated computational load) than the amount of resources for each of them to be hosted individually. Furthermore, combinations whose valleys or troughs in CPU or memory or I/O occur at the same times should also be avoided, or the system-idle unused capacity and cost-effectiveness (and total cost of ownership, TCO) for the aggregate will likewise be worse for such clients combined together than had they been hosted individually on separate (smaller-capacity) equipment.
The rate of dataflow in computer networks between hosts and clients in RHO and ASP systems depends upon many parameters. Some of these parameters can be tied to the provision of telco, router, firewall, network, CPU, memory, disk, and other resources. These provisioned resources can be measured and system performance reports can be generated to determine whether the parameters are in compliance with a negotiated Service Level Agreement (SLA), or whether existing allocations present risks of near-term violation of the terms of the SLA. An SLA between a service provider and a client organization defines the expected and acceptable properties of the services, typically in the context of providing Internet-based application services that are hosted by the remote-hosting organization. The SLA sets forth the means and metric whereby specified performance goals can be measured, by defining the performance metrics and the corresponding goals and level-of-service guarantees. By monitoring compliance with SLA limits, an RHO service provider can avoid the costly problems that result from disappointing users or hosted client organizations.
RHO operations can be monitored and measured using standard techniques such as Route Monitor (RMON) and IBM WEBSPHERE, TIVOLI, or other available monitoring software. Application-layer parameters such as transactions-per-second throughput, latency (waiting times) and end-to-end round-trip time are influenced by conditions such as CPU availability (periods of processing overload), and secondary resource availability (e.g., database I/O bandwidth). Furthermore, some network monitors monitor the number of concurrent network connections that can be opened on each server and the number of concurrent users who have sessions active at each point in time, logging such information to a monitoring database.
It is generally known that an SLA can be defined to guarantee the dataflow rates and system availability in remotely-hosted RHO systems. Resource capacity (bandwidth) is allocated or assigned to the flows by the managers of the systems so as best to satisfy the SLA parameters. SLA-based allocations are intended to guarantee the requested bandwidth from the client to the server and back.
It is further known by those skilled in the art that individual host computers can create logs of each client request and each moment of system resources' utilization. These log files are stored on disk in the host computers. The log files contain ‘raw,’ unformatted information about each transaction or client request, and may be provided in diverse, incompatible formats.
One major disadvantage of the prior art is that existing monitoring/logging mechanisms are necessarily tied to particular machines, even though a user transaction may be serviced by any of several different machines. Similarly, reporting on the performance related to some particular software application or service is difficult when the same content can be served by any one of several different machines.
One example of prior art is the SLA implementation disclosed in U.S. Pat. No. 5,893,905, issued Apr. 13, 1999. In that system, as applied to a scheduled computer processing job environment, a monitoring center automatically retrieves job exception data, job run data, and clocktime data from multiple computer subsystems, each of which is running a monitoring data collection program module. The retrieved data are stored in appropriate databases for each type of data collected. A jobflow table, according to the daily SLAs, is also stored in the system, corresponding to a set of application programs and scripts executed. A ‘periodic data analysis process’ determines whether jobs were run in a timely fashion, or if errors have occurred. If delayed processing is detected, the system determines whether the result will negatively affect contractual conformity with the SLA. If a problem is detected, then system management personnel are alerted with reports identifying the variances that may impact an SLA, and which SLA is in jeopardy, so that operations personnel can take additional manual steps, such as reallocating one or more clients or applications to a different set of resources.
Each server machine can run some finite number of application processes, depending on the resource requirements of those software applications and the particular amounts of resources available on the server. The use of these applications processes is through request messages, to which there may be replies. The collection of servers is known as a cluster. Request messages for a particular application are split among all instances of that application on a resource-set that is allocated and accessible to an authorized client-set. Therefore, when application instances use different servers, the size of a cluster directly impacts the amount of load that the cluster can sustain without performance degradation.
When the size of a cluster is insufficient, the application users experience performance degradation or failures, resulting in the violation of Service Level Agreements (SLA). Today, to avoid SLA violation, application providers must overprovision the number of application instances to handle peak loads, resulting non-optimal total cost of ownership (TCO). This results in cost-inefficient (low) resource utilization during normal operation conditions.
Dynamic allocation techniques available today (e.g., IBM TIVOLI INTELLIGENT THINKDYNAMICS ORCHESTRATOR), assign applications to server clusters. Then, servers are reallocated among clusters based on the offered load.
The prior art has several limitations. (1) When only one application can be assigned to a cluster at any given time, the granularity of resource allocation is coarse. The approach is wasteful when an application demand is not sufficient to utilize an entire server. (2) They are practicable only for generic applications (and generic business-rules that do not differ by user or by client). (3) They are valid and effective only for load-balancing on short timescales, ranging from milliseconds to a few days and are unable to optimize allocations for load-balancing, cost-effectiveness (TCO), or SLA compliance assurance on longer timescales (weeks to months). (4) In the process of server reallocation from one application to another, the old application has to be uninstalled, the server reconfigured, and the new application has to be installed. Usually, network configuration also needs to change. This reconfiguration process may be time-consuming and therefore cannot be performed frequently, which results in lower responsiveness to load changes, or greater incidence of SLA noncompliance, or both. (5) Source traffic streams as well as aggregated traffic flows often exhibit long-range-dependent (LRD) properties that are time-dependent. The superposition of a finite number of multiplicative multifractal traffic streams results in another multifractal stream, but prior art does not provide a practicable method or system for predicting the properties of the resulting multifractal stream and the feasibility of supporting the workload associated with the aggregated steam on the available computing resources while conforming to the agreed SLA performance parameters with a high degree of statistical confidence.
Similar problems have been studied in theoretical optimization literature. The special case of problems with uniform memory requirements was studied by H. Schachnai and T. Tamir (2001), where some approximation algorithms were suggested. Related optimization problems include bin packing, multiple knapsack and multi-dimensional knapsack.