The present invention relates to service level agreement-aware scheduling for cloud services.
In the cloud computing environment, service providers offer vast IT resources to large set of customers with diverse service requirements. The service providers leverage the customer volume to achieve the economies of scale to become profitable, which is the key for success. The profit of the service provider is essentially the difference between the revenue and the operational costs. The revenue is defined as the price the customer pays to the provider for the service and the operational cost is the total cost of all resources that the service provider needs to use to operate. The price is defined in the contract between the service provider and the customer and it is typically tied to specific service level agreements (SLAs).
Typically a cloud service delivery infrastructure is used to deliver services to diverse set of clients by sharing the infrastructure resources among them. Each client signs a specific contract with the cloud service provider. The contract defines the agreed upon service fees for the client. The client agrees to pay certain price for specific levels of the service that is provided by the provider. It is very common that such service contract, or SLAs, are in the form of a piecewise linear function. Given the SLAs, the cloud service provider may arrange the order of job execution based on the priorities of the jobs in a way to gain the maximal expected profit. Such priorities should be quickly computed in order to meet the real-time requirement in practice.
An SLA is a contract between a service provider and its customers and indicates the level of service agreed upon as well as associated revenue gain or loss. SLAs can be about many aspects of a cloud computing service such as availability, security, response time, among others. FIG. 1 shows two examples of SLAs in terms of query response time vs. profit of the service provider. FIG. 1(a) shows a staircase-shaped SLA where the service provider has a profit gain of g1 if the query response time is earlier than t1, g2 if the query response time is between t1 and t2, and the service provider has to pay a penalty of p1 if the query response time is greater than t2. Such a staircase-shaped SLA is commonly used in business because they can be easily described by a few “if-clauses” in a contract. FIG. 1(b) shows another SLA, which is commonly used for job scheduling in grid computing: a profit gain g1 is obtained if the query response time is early than t1; if the response time is between t1 and t2, the profit decreases in a linear fashion and becomes negative after certain point; however, the penalty remains fixed at p1 after t2 because for example the customer may use t2 to trigger a time-out and to resubmit the query.
Although there is a mapping between system-level metrics and costs through SLAs, the system-level performance can be dramatically different from the cost performance. For example, assuming a service provider offers services to two customers, a gold customer and a silver customer, by using a shared database server. Because the customers have different SLAs, simply from the distribution of query response time for all the queries, we are not able to tell how well the performance is in terms of cost, because the latter depends on other factors such as the shape of the SLAs from the gold and silver customers, the workload ratios between the two customers, how the scheduling priority is determined, and so on.
The SLA can also be expressed as a function ƒ(t) where t is the response time of a query and ƒ(t) is the corresponding cost if the response time is t. In a piecewise linear SLA, the function ƒ(t) can be divided into finite segments along the time line and in each segment, ƒ(t) is a function linear in t.
FIG. 2 shows several examples of piecewise linear SLAs. In the figure, the x-axis represents query response time, i.e., time when the query is answered with respect to the time that the query arrives at the system (for easy illustration, we assume the query arrives to the system at time 0); the y-axis represents the cost corresponding to different response time t. We discuss these cases in detail.
FIG. 2(a) describes an SLA with a linearly increasing cost versus the response time. That is, the cost is proportional to the response time. Such an SLA reflects a target of weighted mean response time. A moment of thought will reveal that under such SLAs, minimizing the total cost among all the queries is equivalent to minimizing the weighed response time of all queries where the weight for each query is proportional to the slope of its linearly increasing SLA.
FIG. 2(b) describes an SLA in the form of a step function, where the cost is c0 if the response time is less than t1 and c1 otherwise. As a special case, when c0=0 and c1=1 for all queries, the average cost under such an SLA is equivalent to the fraction of queries that miss their (sole) deadlines. It is worth noting that in this case, and in all other cases in the figure, the parameters such as c0, c1, and t1 can be different for each individual query. That is, each query can name its own deadline and corresponding cost for missing that deadline.
FIG. 2(c) describes a staircase-shaped SLA. The cost is c0 if the response time is less than t1, c1 if the response time is between t1 and t2, and so on. Finally, the cost becomes a fixed value (c2 in this example) after the last deadline (t2 in this example). Such an SLA can be used to capture multiple semantics simultaneously. For example, when checking out a shopping cart at an online ecommerce site, a response time less than t1 may result in good user experience whereas a response time longer than t2 may reflect certain unacceptable performance.
FIG. 2(d) describes the mixture of a step function and a linear function. That is, the cost remains constant up to a response time t1 and then grows linearly afterward. This SLA allows a grace period, up to t1, after which the cost grows linearly over time. This example also illustrates that in general, SLAs may contain cost “jumps” (at time t1 in this example) in the cost functions.
FIG. 2(e) describes another way of mixing the step and linear functions. The cost initially grows linearly, but after time t1, it becomes a constant. This SLA captures, for example, the concept of a proportional cost with a time-out, i.e., after t1, the damage has been done and so the cost has reached its maximal penalty value.
FIG. 2(f) describes a piecewise linear SLA in the most general form. That is, the slope at each time segment can be different and there can be cost jumps between consecutive segments in the SLA. It is worth mentioning that although not strictly required by CBS, in practice the SLA functions are usually monotonic, i.e., longer response time is never preferred over shorter response time.
In a database service provisioning setting, data management capabilities are offered as a service in the cloud. In this setting the database systems may receive very large volume of queries from diverse set of customers who share the system resources and have varying price levels captured in the SLAs. Then, it is critical for the service provider to prioritize and schedule the queries in the system in way that the total profit is maximized.
Conventional solutions in this area mainly rely on techniques that do not primarily consider profit maximization. These techniques mainly focus on optimizing more primitive metrics such as average response time. Therefore they cannot optimize profits. Also, almost all prior art considers continuous cost functions rather than discrete levels of costs corresponding to varying levels of service, which is not realistic in many real-life systems.
Cost-Based Scheduling (CBS) does consider cost, but this technique is prohibitively high in computational complexity to be feasible for high volume and large infrastructures. In CBS, each query has its cost function, which decides its query cost given the query response time. The query response time is the time between when a query arrives at the system and when the query answer is returned by the system.
U.S. application Ser. No. 12/818,528, filed by the inventors of the present application and commonly owned, discussed techniques that can efficiently handle a special case of piecewise linear SLAs-SLAs with staircase shapes.