To manage large-scale data centers and utility clouds, continuous monitoring along with analysis of the data captured by the monitoring are often performed. Monitoring constitutes a critical component of a closed-loop automation solution in data centers. Next generation data centers such as for emerging cloud infrastructures are expected to be characterized by large scale, complexity, and dynamism. Increased core counts, increased blade densities, and virtualization would result in numbers of end systems and a degree of heterogeneity that would substantially benefit from an automated and online monitoring and management system with minimal administrator intervention. However, performing continuous and on-demand monitoring to detect, correlate, and analyze data for a fast reaction to system issues is difficult, especially when a huge volume of monitoring data is produced across multiple management domains and nodes.
Most current monitoring approaches are centralized, ad-hoc, and siloed, leading to scalability, visibility, and accuracy limitations. Typically, analysis of monitoring data is done offline resulting in hindrance to automated solutions. Distributed monitoring and aggregation systems have been proposed. However, they impose high overhead due to use of expensive peer-to-peer mechanisms not optimized for management needs in data centers. In addition, distributed monitoring systems such as Ganglia are popular, however, they use a static hierarchy, having limited support for advanced analysis functions and runtime changes to monitoring hierarchy. The scalability, visibility, and accuracy limitations of the existing centralized, ad-hoc, and siloed approaches to monitoring may translate to high costs and unsatisfied service level agreements (SLAs).