With the increasing adoption of Service-oriented Architecture (SOA) and cloud computing technologies where Information Technology, including infrastructure, platforms and applications are delivered as services, there is an increasing use of the shared resource model. In such a model, computing and IT resources are shared across multiple applications. Accordingly, there is an increasing need for solutions that optimize the resource allocation. Power, cooling and real estate costs represent a significant portion of the overall cost in operating a cloud computing platform, service or datacenter. The reduction or optimization of resources associated with such costs creates a net benefit in total operating cost, reduces the need for expensive infrastructure and provides an opportunity to expand the platform. The challenge in consolidating such workloads is to minimize the number of physical servers while taking into consideration the resource needs across multiple dimensions. In this space, the dimensions include, but are not limited to, CPU, memory, data storage, I/O, networking bandwidth, network topology, and router utilizations—which are all subject to change in real-time dependent upon user needs and workloads.
Server consolidation methods aim to provide an efficient usage of computer server resources in order to reduce the total number of servers required for a particular software implementation, or a collection of software implementations. That is, server consolidation functions to address the problem of “server sprawl”. Server sprawl is understood in the art to refer to multiple under-utilized servers consuming more resources than is necessary to provide the functionality required by the software packages loaded thereupon.
Server consolidation may generally be classified into three stages: centralization, physical consolidation and data and application integration. Centralization involves moving servers to a common location. Physical consolidation involves moving a large number of existing servers to a small number of high-performance servers. Storage consolidation is also a kind of physical consolidation where disparate or related data and applications are integrated into a common database and common application. These classifications operate to reduce server under-utilization—typical levels of under-utilization in a non-consolidated environment may range from 15-20% of individual physical server capacity being unused.
A technique for physical consolidation, which is well known in the art, is the use of server virtualization technology. Virtualization enables multiple existing servers to be moved to share the resources of a single computer, or a dynamically selected grouping of computers. That is, software is used to divide one physical server into multiple isolated virtual environments or instances. Multiple methods of virtualization are known to those skilled in the art, e.g., hardware emulation, para-virtualization, OS-level virtualization, application level virtualization, etc. Regardless of the particular virtualization implementation method, the goal is to minimize the number of physical servers. This goal, minimizing the number of physical servers, competes directly with the twin goal of ensuring that sufficient resources are made available to avoid performance degradation. Put another way, sufficient resources are required to avoid degradation in performance, wherein the sum of resource utilization for virtual machines on a physical server (destination server) does not exceed the threshold limits prescribed for that particular destination server, while the number of destination servers is as small as possible to provide a cost benefit to the server consolidation process.
The optimization of destination servers may be viewed as a bin or vector packing problem. That is, items of different sizes must be packed into a minimum number of bins with a defined capacity. The basic bin packing problem is as follows: given N objects, each with a value vi, i=1, . . . , N, these objects must be packed in as few bins as possible such that Σνt of objects packed in the same bin does not exceed the bin's capacity. The bin packing problem may be understood in the server consolidation context as follows: objects for server consolidation are existing servers, object sizes are resource utilizations, bins are destination servers, and the bin capacity is the utilization threshold of the destination servers. Resource utilizations may include existing server CPU, disk and memory requirements. Where multiple resources (CPU, disk, memory, etc.) are being considered, the resources form multiple dimensions in the packing problem. The solutions to bin and vector packing problems are the same in the one-dimensional case. However, in multi-dimensional cases, the problem is considered as a vector packing problem.
A two-dimensional server packing problem may be formally understood as follows: Let ρci and ρdi be the CPU and disk utilization of an existing server si (i=1, . . . n), Xj be a set of existing servers consolidated into a destination server s′j (j=1, . . . , m), and Rc and Rd be the thresholds of CPU and disk utilizations prescribed for the destination servers. Thus n existing servers are all consolidated into m destination servers. The problem is then to minimize n under the constraints that Σsiεxjρci≦RC and Σsiεxjρdi≦Rd. Note, however, that the performance characteristics (CPU, disk, etc.) of a destination server may be higher than that of an existing server. In such an instance, if the performance of a destination server is h times higher than that of an existing server, CPU utilization ρmeasured on the existing server is converted into ρ/h on the destination server. Further, the virtualization overhead increases the utilizations of servers running on virtual machines and the threshold values for destination servers must be modified accordingly.
There are several methods well known in the art to solve such multi-dimensional vector packing problems, for example, the First Fit Decreasing (FFD) algorithm. The FFD algorithm may be understood by the following pseudo code.
sort existing servers to {s1, ..., sn} in descending order;m← 1; X1 ← { };for i ← 1 to n do for j ← 1 to m do  if packable (Xj, si) then   Xj ←Xj ∪{si};   break  fi end for; if j = m + 1 then/* if fail to pack si, */  m ← m + 1;/* a new server is added */  Xm ← {si}/* to have si */ fiend for
The FFD algorithm addresses the server packing problem by first receiving n existing servers and sorting them in descending order of utilizations of a certain resource. After the algorithm is executed, it produces server accommodations Xj (j=1, . . . , m), where m is the number of destination servers. The function packable (Xj, si) returns true if packing existing server si into destination server s′j satisfies the constraints (i.e., the utilization of s′j does not exceed a threshold for any resource); otherwise it returns false. FFD sequentially checks if all existing servers s1, . . . , sn can be packed into one of m current destination servers. FFD then packs si into a destination server first found to be able to accommodate it. If si cannot be packed into any current destination server, the (m+1)-th destination server is added to accommodate it. The complexity of this FFD algorithm is O(n2) because m is almost proportional to n.
A second algorithm for vector packing known in the art is the Least Loaded algorithm (LL). The LL algorithm may be understood by the following pseudo code.
sort existing servers to {s1, ..., sn} in descending order;m←LB ({s1, ..., sn });while true do for j ← 1 to m do  Xj←{ }   /* initialization */ end for; for i←1 to n do  sort destination servers to {X1, ..., Xm} in ascending order;  for j←1 to m do   if packable (Xj, si) then     Xj←Xj∪{si};     break   fi  end for;  if j= m + 1 then  /* If fail to pack si, a new server is added */   m ←m + 1;   break  fi end for; if i = n + 1 then      /* all packed */  break fiend while
The LL algorithm attempts to balance the load between servers by assigning incoming jobs to the least-loaded server. In server packing, an existing server with a high utilization is packed into a server with low utilization. The function LB({s1, . . . , sn}) returns the theoretical lower bound for the number of destination servers that accommodate existing servers {s1, . . . , sn}. The lower bound is the smallest integer of numbers larger than the sum of the utilizations divided by a threshold. The lower bound for the CPU is
            LB      c        =                                  ∑                      i            =            1                    n                ⁢                              ρ                          c              i                                /                      R            c                                      ,while that for the disk is
      LB    d    =                                    ∑                      i            =            1                    n                ⁢                              ρ                          d              i                                /                      R            d                                      .  Function LB({s1, . . . , sn}) returns the larger integer of the two lower bounds.
There are two differences between LL and FFD. First, LL starts repacking after a new destination server is added when it has failed to pack an existing server into current m destination servers. This is aimed at balancing the load between a newly added destination server and the others. In contrast, FFD packs the existing server in question into a new destination server and continues to pack the remaining existing servers. LL initializes m to the lower bound to save time, even though we can also start with m=1. Second, LL sorts destination servers (which accommodate X1, . . . , Xm) in ascending order of utilizations each time before packing an existing server to pack it into a less-loaded destination server. The complexity of LL is O(d·n2 log n) where d is the difference between the lower bound and the final number m of destination servers.
The LL and FFD algorithms are limited in that only a single dimension is optimized at a time, i.e., neither LL nor FFD optimize multiple resources in a simultaneous manner. Further, because each dimension must first be considered independent of other dimensions, there is an inherent performance (time) cost to the optimization process.