Organizations invest in technologies that provide customers with access to computing resources through services. Such services provide access to computing and/or storage resources (e.g., storage devices providing either a block-level device interface or a web service interface) to customers. Within multi-tier ecommerce systems, combinations of different types of resources may be allocated to customers and/or their applications, such as whole physical or virtual machines, CPUs, memory, network bandwidth, or I/O capacity. Block-level storage devices implemented by a storage service may be made accessible, for example, from one or more physical or virtual machines implemented by another service. To facilitate the utilization of data center resources, virtualization technologies may allow a single physical computing machine to host one or more instances of virtual machines that appear and operate as independent computer machines to a connected computer user. With virtualization, the single physical computing device can create, maintain, or delete virtual machines in a dynamic manner.
In a large distributed computing system (e.g., multiple distributed data centers) of a computing resource service provider, various customers, users, services, and resources of the computing resource service provider are frequently shared between customers. In addition, these computing resources are often leveraged in large-scale networks of computers, servers and storage drives to enable clients, including content providers, online retailers, customers and the like, to host and execute a variety of applications and web services. The usage of network computing allows content providers and customers, among others, to efficiently and adaptively satisfy their computing needs. However, with the growing use of virtual resources, customers are encountering situations in which the large amount of virtual computing resources makes it difficult to troubleshoot and diagnose issues. For example, a single customer's virtual computing resources may produce millions of lines of log data in a single day or even a single hour. This log data may contain useful information for troubleshooting, diagnosing, and detecting issues and anomalies within these large distributed computing systems. However, it is difficult for customers and/or service providers to discover useful information given the sheer size and amount of data to process.