A server or a hosting environment can consist of a very large number of nodes that are used to process computational requests of different customers. So that the processing of the various customer requests can be as independent as possible from the actual hardware resources to which the requests are assigned, the systems are beginning to use “virtual machines,” which are collections of individual physical resources, to processes the users' requests. With virtualization of the physical resources, the virtual machines executing the different customer requests can be migrated to other hardware within the system without influencing the outcome of the corresponding user requests. In addition, the applications that are used to process the user requests do not need to re-configured to adapt to new hardware.
Currently, computational jobs can be allocated by a scheduler to pre-defined virtual machines running on specific hardware within the hosting environment. Alternatively, a load balancer running on the hosting environment can distribute currently running tasks in a way that all resources within the system have a similar utilization. Technically, the load balancer moves the different virtual machines and this can be done easily with existing virtualization techniques. The idea behind load balancing is to utilize the resources of the system as efficiently as possible to maximize the overall throughput of computational jobs within the system. To this end, management software like the Virtual Machine Manager from IBM or the ProLiant Essential Workload Management Pack from HP can be used.
From the customer's point of view, high availability of the hardware resources and short response times to their requests are desirable. From the provider's point of view, a high customer satisfaction is desirable, as this ensures future business from the customers. However, providers have different commercial relationships with the different customers. For example, some customers may be willing to pay more for a higher availability and shorter response time than other customers. Thus, providers would like to provide different service levels (e.g., resource availability and response time) to customers, so that they can cater more to high-priority customers, while still maintaining an adequate service to lower-priority customers.
To date, this kind of emphasis on certain customers has been implemented by dividing the hosting environment used to process the customers' requests into several partitions, with different partitions having different processing bandwidth, and assigning jobs from different customers to different partitions. Unfortunately, this leads to the effect that the overall utilization of the hosting environment is decreased, because idle resources in one partition cannot be used to process jobs for customers that are assigned to other partitions. Furthermore, a customer request assigned to a partition that is already fully utilized (e.g., to process other requests assigned to the partition) cannot be processed at all, even if enough idle resources are available in other partitions. In another implementations, customer quotas have been used to implement a kind of ranking between different customers, such that each customer has a limited amount of processing time per time period on the system. Then, if the customer's allocated time within a time period is already used, no other request from the customer can be processed in that time period. This again leads to a low overall utilization, because some requests are not processed even if enough resources were idle.