The present invention relates to techniques for providing the capability for computing systems to meet tail latency targets using workload redundancy and resource redundancy.
Cloud computing is a type of network-based computing that provides shared processing resources and data to computers and other devices on demand. Computing and storage resources located in the cloud provide users with the capability to store and process their data in data centers that are typically, owned, operated, and maintained by third-parties. One common service provided by cloud computing is hardware virtualization. With hardware virtualization, virtual computing resources, such as complete computers, or portions of computers, can be provided in the cloud using what are known as virtual machines.
One issue with hardware virtualization is the provisioning of sufficient resources to provide adequate performance. Typically, performance targets are specified in contracts known as cloud service level agreements (CSLAs). One important performance target that is often specified in CSLAs is known as the tail latency. The tail latency may specify that the latency, or time delay experienced in using the system, should be less than a certain target value 95 percent of the time. To meet such a target, it is typical to provision a large amount of resources, such as virtual machines, to service each user. However, this solution can be very costly because typically a large number of virtual machines must be provisioned. This leads to low resource efficiency as the cluster utilization is low. Further, increasing the number of virtual machines may not be sufficient as the increased number of virtual machines may not always able to meet the target.
Accordingly, a need arises for techniques by which tail latency targets may be met with improved performance and reduced cost.