Organizations are increasingly relying on cloud-based computing systems to perform large-scale computational tasks. Such cloud-based computing systems are typically operated by hosting companies that maintain a sizable computational infrastructure, often comprising thousands of servers sited in geographically distributed data centers. Customers typically buy or lease computational resources from these hosting companies. The hosting companies in turn provision computational resources according to the customer's requirements and then enable the customers to access these resources.
In many cases, cloud-based computing systems provide a virtualized computing environment, wherein tasks run on “virtual machines” that execute on underlying physical host systems. Such virtualized computing environments enable computational tasks to be easily moved among host systems to facilitate load balancing and fault tolerance. However, they also complicate the process of diagnosing and resolving performance problems because bottlenecks can arise at both the virtual-machine level and the host-system level.
Existing performance-monitoring tools do not provide an easy way to diagnose performance problems in such computing systems.