Modern datacenter infrastructure is a distributed system with a large number of physical and logical objects. Behavior of these objects is managed automatically, governed by service level agreement (SLA)-driven and business-driven policies. The common goals are to reduce maintenance costs, to drive greater efficiencies for more flexibility, and to dynamically scale resource pools.
To enable accurate workload placement and overall management of the datacenter based on this physical hierarchy, multiple parameters of each server node are monitored. This information is used to categorize and rank servers for workload placement and movement. Typical solutions for obtaining this data require server node management software to set up counters to record/monitor the desired parameters, and then periodically interrupt the workload running on each node to read the set of performance and event monitor counters. However there are drawbacks including the effect on the workload by interrupting its operation. Further, as the number of cores per node and the number of nodes increases, the overhead of periodically reading and processing these counters becomes significant at the datacenter level. As such, the current counter read model for monitoring is not scalable. Also the counters used for monitoring are typically used by an operating system (OS)/application for performance monitoring/tuning/profiling as well. Since the OS/application has priority, the counters are often unavailable to the datacenter management software and due to delays via counter unavailability and the amount of data to be processed, inaccurate information may be used for decision making.