Performance issues in web service applications are notoriously hard to detect and debug. In many cases, these performance issues arise due to incorrect configurations or incorrect programs. Web servers running in virtualized environments also suffer from issues that are specific to cloud, such as, interference, or incorrect resource provisioning. Among these, performance interference and its more visible counterpart performance variability cause significant concerns among IT administrators. Interference also poses a significant threat to the usability of Internet-enabled devices that rely on hard latency bounds on server response (imagine the suspense if Siri took minutes to answer your questions!). Existing research shows that interference is a frequent occurrence in large scale data centers. Therefore, web services hosted in the cloud must be aware of such issues and adapt when needed.
Interference happens because of sharing of low level hardware resources such as cache, memory bandwidth, network etc. Partitioning these resources is practically infeasible without incurring high degrees of overhead (in terms of compute, memory, or even reduced utilization). Existing solutions primarily try to solve the problem from the point of view of a cloud operator. The core techniques used by these solutions include a combination of one or more of the following: a) Scheduling, b) Live migration, c) Resource containment. Research on novel scheduling policies look at the problem at two abstraction levels. Cluster schedulers (consolidation managers) try to optimally place VMs on physical machines such that there is minimal resource contention among VMs on the same physical machine. Novel hypervisor schedulers try to schedule VM threads so that only non-contending threads run in parallel. Live migration involves moving a VM from a busy physical machine to a free machine when interference is detected. Resource containment is generally applicable to containers such as LXC, where the CPU cycles allocated to batch jobs is reduced during interference. Note that all these approaches require access to the hypervisor (or kernel in case of LXC), which is beyond the scope of a cloud consumer. Therefore, improvements are needed in the field.