In recent years, virtual machines (“VMs”) have become increasingly used in datacenter operations and in other large-scale computing environments. VMs are a software implemented abstraction of a physical machine, such as a computer, which is presented to the application layer of the system. A VM may be based on a specification of a hypothetical computer and may be designed to recreate a computer architecture and function of a physical computer. In datacenters, VMs are often used in server consolidation. For example, a typical non-virtualized application server may achieve between 5% to 10% utilization. But a virtualized application server that hosts multiple VMs can achieve between 50% to 80% utilization. As a result, virtual clusters composed of multiple VMs can be hosted on fewer servers, translating into lower costs for hardware acquisition, maintenance, energy consumption and cooling system usage. The VMs in a virtual cluster may be interconnected logically by a virtual network across several physical networks.
In order to monitor the performance of VMs, datacenters generate streams of telemetry data. Each stream is composed of metrics that represent different aspects of the behavior of an application, a VM, or a physical machine. For example, virtual machine monitors can be used to produce a stream of telemetry data composed of hundreds of real and synthesized metrics associated with a VM. The telemetry streams may be sampled at very high rates. As a result, the telemetry datasets can be very large, containing hundreds of metrics for each VM resulting in aggregate data volumes that scale with the number of VMs monitored. The telemetry data size and high sample rates strain efforts to store, process, and analyze the telemetry data stream.