In a typical cloud data center environment, there is a large collection of interconnected servers that provide computing and/or storage capacity to run various applications. For example, a data center may comprise a facility that hosts applications and services for subscribers, i.e., customers of data center. The data center may, for example, host all of the infrastructure equipment, such as networking and storage systems, redundant power supplies, and environmental controls. In a typical data center, clusters of storage systems and application servers are interconnected via high-speed switch fabric provided by one or more tiers of physical network switches and routers. More sophisticated data centers provide infrastructure spread throughout the world with subscriber support equipment located in various physical hosting facilities.
For troubleshooting purposes, a data center may include a logging device, or “collector,” that communicates with devices within the data center to obtain information from the devices that describes the devices' operation, such as errors, events, operations performed, etc. Devices that generate logging information may alternatively be referred to herein as “generators.” Such information is commonly referred to as log information and the collector may record the log information to a database or a collection of log files that is a central repository of logging information for devices in the system. To the extent possible, system administrators often prefer to diagnose conditions in the distributed systems using logging information rather than reproducing the problems, which is generally undesirable. In distributed systems, such as the aforementioned data center, a large number of devices may generate logging information.
The generation, storage, and use of logging information require balancing a number of tradeoffs however. For example, while generating as much logging information as possible is often desirable for troubleshooting, logging copious amounts of information can produce large log files and also takes up bandwidth and computational resources, which may slow the application and other applications concurrently executing on the device. Conversely, logging too little information risks rendering the logging information of little value in troubleshooting.