Cluster based computing continues to become more common as the need for processing large data sets increases. Additionally, computing clusters may be employed to provide the computing resources for popular network and cloud-based applications, such as, search engines, social networks, online media, or the like. For many common applications the number of nodes comprising clusters may increase as the size of the data sets and the number of simultaneous users increase.
In some cases, computing clusters may comprise hundreds of heterogeneous nodes, including, data nodes, various control nodes, load balancers, or the like. Also, computing clusters may be distributed across multiple physical locations. The large number of nodes, node heterogeneity, and node de-centralization contribute to system complexity which may increase the difficulty of monitoring and/or troubleshooting computing clusters.
Receiving log files and other machine data generated by cluster nodes comprising computing clusters may overwhelm standard monitoring and troubleshooting techniques. The machine data generated by cluster nodes comprising computing clusters may result in large unwieldy datasets that are difficult to search, monitor, or review. Furthermore, even if errors and failures are detected using standard practices such as log files, the complexity of computing cluster systems coupled with the large amount of machine data may make the discovery of the causes of failures and subsequent troubleshooting difficult. Thus, it is in the consideration of at least these issues that the following subject matter is directed.