The computational infrastructure of a distributed cloud may consist of hundreds of thousands of servers, distributed across a set of geographically distributed datacenters. Such a distributed cloud will show a huge degree of heterogeneity in the size of each datacenter as well as in the capacities of the available resources. The latter case, in particular, is also particularly valid within a single datacenter, which can have a variety of different types of resources that may be provided by different vendors.
Large-scale cloud infrastructures require a service placement solution that can efficiently determine, for a given service request, a set of available resources that matches the requirements of the service request (e.g., a number of virtual CPUs, an amount of memory and/or storage, an amount of specialized processing units, a network capacity, etc.) Primarily, this solution must be found, given that it exists, in a timely fashion. However, this task is extremely challenging, in part due to the potentially massive search space naturally involved in such large scale, dynamic, distributed, and/or extremely-heterogeneous contemporary cloud infrastructures. Furthermore, it is even more challenging when the heterogeneity of the workload and the requirements of service requests is taken in account.
For example, the workload of the cloud comes from various services ranging from delay-sensitive services such as gaming, video-conferencing, and multi-tier web services, to delay-tolerant batch services such as high performance computing and map-reduce types of applications. Such services usually operate through the use of one or more software components. Each component can have certain requirements ranging from the physical characteristics of the host platform—such as an availability of certain hardware accelerators, resources in terms of CPU, memory, storage, and network—to placement and affinity requirements defined via location and colocation constraints.
Placement and affinity constraints can be specified by clients for purposes such as fault-tolerance or compliance with organizational or legal regulations. For example, such placement and/or affinity requirements may limit the location of certain service components to a specific geographic location or a specific datacenter. Furthermore, they can impose a set of rules declaring which components of a requested service should be co-located, and/or which components of the service are not to be co-located. Thus, when a service placement request arrives, a cloud platform has to quickly find—and reserve—a set of resource units (e.g., containers, virtual machines, bare metal machines, etc.) to run the service components and form their network connectivity.
Some large-scale cloud resource scheduling solutions have been developed in the academic and the industrial domains, such as projects referred to as Apollo, Borg, Mesos, Omega, and Yarn. However, these solutions are primarily designed for single datacenter Platform-as-a-Service (PaaS) clouds, and have a main focus of hosting applications that only target data-parallel tasks (e.g., MapReduce tasks). Thus, these solutions utilize various techniques to perform per-task scheduling such that the desired objectives are fulfilled while preserving the order in which the tasks are run. For example, Borg and Omega also handle workload heterogeneity by utilizing various categorization and scoring techniques that prioritize tasks in order to make better placement decisions.
Additionally, existing Infrastructure-as-a-Service (IaaS) solutions, such as OpenStack, primarily perform scheduling for a single datacenter. In such solutions, a service placement request is processed one component at a time. For instance, the scheduler used by OpenStack's Nova performs scheduling on a per computational unit (e.g., a virtual machine (VM)) basis considering the affinity rules associated with the request. The scheduling solution first filters the compute resources that meet the requirements. It then chooses the best candidate based on a linear weighted average cost model. OpenStack's orchestration service, named Heat, handles multi-component service requests across several computational units in terms of affinities and network connectivity, but each component is managed one-by-one.
There also are a number of studies in the academic literature proposing optimization-based solutions for cloud workload placement. The solutions are typically designed in order to place cloud workload such that a cloud utility cost model (e.g., Service-Level Agreement (SLA) revenue, energy consumption, electricity cost) is optimized without compromising delay requirements of the workload. These placement algorithms, however, also typically deal with determining a placement for a standalone software component, rather than a service placement including multiple components.
Additionally, a complete service request typically includes requirements for multiple resources—computational, memory, storage, network, etc.—needed for the service. Most of the existing solutions do not take the request as a whole into account; instead, they decompose the request into its pieces and try to match the requirements for each entity in isolation. However, if the system fails to allocate one of the resource entities, it must perform a rollback, which involves undoing all successful allocations of resource entities in the service request. Additionally, existing solutions are typically centralized or rely on centralized data storage that holds information on all available free resources in the system. This is feasible in smaller datacenters; however, it does not scale well in large datacenters or in distributed clouds.
Moreover, a more fundamental problem that also needs to be addressed is the strongly probable combinatorial explosion of possible service placement solutions that result as datacenters get larger and larger. This problem is particularly large when many datacenters are connected into a distributed cloud, and more requirements are taken into account.
The applications of the existing solutions (such as in Apollo and Borg, for example) for distributed cloud service placement, where service requests come with different affinity policies, are extremely restricted. In particular, existing solutions simplify the problem by either assuming that a set of feasible solutions are already given or are very easy to find, or primarily focus only upon data-parallel types of applications and do not deal with affinities associated with various components of the applications.
However, given a huge search space of various cloud infrastructures, finding a feasible set for placing a service is not an easy task. This is especially problematic for the case where there are complex affinity policies for various components of the service in terms of collocations, and when most of the physical servers do not have a large abundance of available resources. In such a case, unless a search is performed through all of the physical resources, the feasible set is very unlikely to be found, resulting in false negative answers. Further, performing scheduling on a per-component basis, without considering its relation to the placement of the other components in the same placement query, will often result in poor performance as a result. This is because such a solution needs some sort of rollback scheme to ensure the affinities across components.
Further, some implementations rely on the availability of system-wide resource availability information existing in a centralized location, and thus require a global view of the cloud's available resources. However, these systems suffer from limited scalability, particularly in modern, large, often-dynamic, distributed infrastructures with heterogeneous hardware and software resources.
Finally, the underlying assumption of the existing theoretical works is that it is possible to achieve a complete global knowledge on the available resources, and accordingly, a feasible set of placements. Based on this assumption, such solutions are designed to find the placement that optimizes the cost. However, for a large-scale distributed cloud infrastructure, searching for a feasible placement is not always an easy task due to the various requirements of services, and the dynamic nature of the cloud's available resources. More importantly, these solutions typically perform per-component scheduling rather than per-service scheduling.
Accordingly, there is a strong need for rapid, scalable, efficient, and accurate multi-component service placement systems in heterogeneous, dynamic, large, and/or distributed environments (e.g., in large datacenters, distributed cloud environments) that can accommodate imposed affinity and/or location-based constraints.