The growing complexity of large infrastructures, such as datacenters, frequently hinders the understanding of the system behavior. System administrators frequently analyze metrics extracted from components of the system, relationships between components of the system, as well as the overall system itself.
While data centers have a wide range of sizes from hundreds to thousands of components, it is common to store tens or even hundreds of different metrics at each timestamp from each component. Depending on the selected period for particular metrics to be read, which is typically a compromise between having the information updated enough and the required resources for the reading, processing, and storing the size of the data to be managed, the overall data increases exponentially over time. Building management tools that can effectively deal with these volumes of data becomes challenging as the systems grow in complexity. For example, not only is there a need of increased processing power for analyzing the amount of data in a feasible amount of time, but also a need for understanding a growing volume of data in a limited space and time as system administrators need to react as fast as possible to any anomaly in the system.
Thus, as the number of components in a datacenter increases, it becomes increasingly difficult to manually compare all the values or connections among components to uncover small changes in the metrics that, while not necessarily individually relevant, may be relevant in combination with other small changes.