Instead of being provided from a centralized set of infrastructure owned and operated by a single entity, data services and applications being offered on the Internet today are increasingly being hosted in a virtualized, multi-tenant infrastructure environment. For example, whereas a photo sharing service might have formerly been hosted on a set of servers and databases operated by the owner or operator of the photo sharing service, today that same photo sharing service might be hosted on a set of “virtual” infrastructure operated by third party providers, such as Amazon Web Services, Google Compute Engine, OpenStack, or Rackspace Cloud. In other words, data services and applications might now be hosted “in the cloud.” Such “virtual” infrastructure or resources can include virtual web servers, load balancers, and databases hosted as software instances running on separate hardware.
There are several reasons commonly cited for building applications for the cloud. Using cloud resources reduces the time required to provision a new virtual infrastructure to effectively zero. Traditionally, it took weeks or even months to acquire, install, configure, network, and image new hardware. Cloud users can launch new instances from an infrastructure provider in a matter of minutes. Another key reason for building for the cloud is the elastic nature of the cloud. When a customer needs more virtual resources, they request more to be provisioned for them. When they are done with the resources, they return them to the provider. The provider charges customers for resources only when they are in use (typically on an hourly basis). Elasticity allows customers to adjust the number of resources they use (and pay for) to match the load on the application. The load on the application may vary according to trends that are short (hourly or daily cycles) or long (growth of the business over months).
Accordingly, there is a need to provide a system which can collect data from virtual infrastructure, monitor and process the data to identify anomalies and potential areas of concern, and report the results to operators of a data service and/or application hosted on the cloud. However, the benefits of the cloud are some of the same things that make the cloud hard to monitor. With virtualized resources, a description of a resource and its behavior is not available via a single source. For example, the infrastructure provider can use Application Program Interfaces (APIs) to provide metadata about a virtual resource within the virtual environment (e.g., where it is located, why type of resource it is, the capacity allocated to the resource, etc.). However, information about what is running inside the virtual container is only available from within that container—such information cannot be provided by querying the infrastructure provider's APIs. Secondly, because of the elastic nature of the cloud, the configuration of an customer application can change very quickly. It is not uncommon for the number (and therefore, aggregate capacity) of resources used by a customer to fluctuate by hundreds per day to accommodate diurnal patterns, or by thousands of resources in a matter of weeks to track business growth. These changes in resources can be driven by demand for the customer's application (e.g., more resources are provisioned if more people are using the application), supply of resources (e.g., a customer may request that additional resources be provisioned only if the price of using these resources fall below a certain threshold), and/or scheduled patterns (e.g., additional resources are provided during expected peak demand times during the day). The monitoring tool must be able to operate within a dynamic environment that is changing faster than can be tracked by human operators.