This application relates to Constraint-Conscious Optimal Scheduling for Cloud Infrastructures.
Cloud computing has emerged as a promising computing platform with its on-demand scaling capabilities. Typically, a cloud service delivery infrastructure is used to deliver services to a diverse set of clients sharing the computing resources. By providing on-demand scaling capabilities without any large upfront investment or long-term commitment, it is attracting a wide range of users, from web applications to Business Intelligence applications. The database community has also shown great interest in exploiting this new platform for scalable and cost-efficient data management. Arguably, the success of cloud-based services depends on two main factors: quality of service that are identified through Service Level Agreements (SLAs) and operating cost management.
Users of cloud computing services are not only able to significantly reduce their IT costs and turn their capital expenditures to operational expenditures, but also able to speed up their innovation capabilities thanks to the on-demand access to vast IT resources in the cloud. While the cloud computing offers the clients all these advantages, it creates a number of challenges for the cloud service providers who try to create successful businesses: they have to handle diverse and dynamic workloads in a highly price-competitive way, to convince the potential clients to use the service delivery model instead of in-house hosting of IT functions. In addition, the quality of service should be comparable in all aspects to the capabilities that can be delivered off of an IT infrastructure under full control of clients. Thus, the success of cloud-based services arguably depends on the two major factors: quality of service, which is captured as Service Level Agreements (SLAs) and operational cost management.
The consistent delivery of services within SLAs is crucial for sustained revenue for the service provider. Delivering those services incurs operational costs and the difference between the revenue and the operational costs is the service provider's profit, which is required for any commercially viable businesses.
The total profit, P, of the cloud service provider is defined as
      P    =                            ∑          i                ⁢                  r          i                    -      C        ,where ri is the revenue that can be generated by delivering the service for a particular job i and C is the operational cost of running the service delivery infrastructure. The revenue, R, is defined for each job class in the system. Each client may have multiple job classes based on the contract. A stepwise function is used to characterize the revenue as shown in FIG. 1. Intuitively, the clients agree to pay varying fee levels for corresponding service levels delivered for a particular class of requests, i.e., job classes in their contracts. For example, the client may be willing to pay a higher rate for lower response times. As shown in FIG. 1, the client pays R0 as long as the response time is between 0 and X1, and pays R1 for the interval of X1 and X2, and so on. This characterization allows more intuitive interpretation of SLAs with respect to revenue generation. Once the revenue function is defined, the revenue function defines a cost function, called SLA cost function. If the level of services changes, the amount that the provider can charge the client also changes according to the contract. Due to the limitations on the availability of infrastructure resources, the cloud service provider may not be able or choose to attend to all client requests at the highest possible service levels. Dropping/Increasing service levels cause loss/increase in the revenue. The loss of potential revenue corresponds to SLA cost. For example, there is no revenue loss, hence no SLA penalty cost, as long as response time is between 0 and X1 in FIG. 1. Likewise, increasing the amount of infrastructure resources to increase service levels results in increased operational cost. As a result, the key problem for the provider is to come up with optimal service levels that will maximize its profits based on the agreed upon SLAs.
SLAs in general may be defined in terms of various criteria, such as service latency, throughput, consistency, security, etc. One embodiment focuses on service latency, or response time. Even with latency alone, there can be multiple specification methods:                Mean-value-based SLA (MV-SLA): For each job class, quality of service is measured based on mean response time. This is the least robust type of SLAs from the customers' perspective.        Tail-distribution-based SLA (TD-SLA): For each job class, quality of service is measured in terms of the portion of jobs finished by a given deadline. For instance, a user may want 99% of job to be finished within 100 ms.        Individual-job-based SLA (IJ-SLA): Quality of service is measured using the response time of individual jobs. Unlike MV-SLA or TD-SLA above, in IJ-SLA any single job with a poor service quality immediately affects the measured quality of service and incurs some SLA penalty cost.        
For each specification method, the SLA can be classified either as a hard SLA or a soft SLA as follows.                Hard SLA: A hard SLA has a single hard deadline to meet, and if the deadline missed, it is counted as a violation. The definition of this type of SLA, or constraint, may come from the client or the cloud service provider. There are cases where a cloud provider needs to use Hard SLAs as a tool to control various business objectives, e.g., controlling the worst case user experience. Therefore the violation of a hard SLA may not correspond to financial terms in the client contracts.        Soft SLA: A soft SLA corresponds to agreed levels of service in the contract. This is different from the hard SLA in that even after the violation, SLA penalty cost may continue to increase as response time further increases. Although the SLA penalty cost may have various shapes, stepwise function is a natural choice used in the real-world contracts. SLAs in general may be defined in terms of various criteria, such as service latency, throughput, consistency, security, etc.        
The unit of operational cost is a server cost per hour. Consequently, the total operational cost, C, is the sum of individual server costs for a given period of time. The individual server cost is the aggregation of all specific costs items that are involved in operating a server, such as energy, administration, software, among others. Conventional scheduling systems typically rely on techniques that do not primarily consider profit maximization. These techniques mainly focus on optimizing metrics such as average response time.