Information Technology personnel (IT personnel) responsible for managing data centers constantly have to perform a number of management tasks such as capacity planning, resource allocation, license management, and patch management. Most of these tasks require careful examination of the current status of all the machines or a subset of the machines in the data center. Considering these tasks, especially for large data centers, it is important to have a scalable monitoring solution that provides insights into system-wide performance information instantly, for example, by capturing data center level metrics. Data center level metrics include metrics computed for multiple machines in a data center. For example, a data center level metric may be an aggregation or average of individual-machine metrics (node level metrics). Examples of data center level metrics include, number of licensed copies of software running in the data center, number of servers for each type of server deployed in the data center, locations of computer resources, cooling and temperature information, power information, etc. Data center level metrics may also include traditional performance metrics for hardware and software, such as CPU utilization, memory and hard disk utilization, web server response times, etc, for multiple machines. These data center level metrics may be used by a system administrator for managing the resources in a data center.
Current approaches to collecting data center level metrics are primarily focused on using centralized databases where such information is collected and aggregated. For example, FIG. 12 shows a conventional system where metrics are received and stored at a central database 1200. Computers 1210 may be in a single data center or multiple data centers. Each of the computers sends captured node level metrics to the central database 1200 for storage. And data center level metrics are computed at the central database. A system administrator 1220 access the central database 1200 to view the collected data center level metrics.
The centralized database solution shown in FIG. 12 does not scale well for large or multiple data centers. For example, consolidated data centers might have 14,000 physical servers, 2,600 applications and 8,400 application instances. Using a typical performance agent, such as Hewlett Packard's™ OpenView Performance Agent (HP OVR), to collect 68 metrics regarding systems and applications every five minutes, would result in 952,000 data points being reported every five minutes and stored in the central database 100. Sending all this data to a single location may not be feasible depending on bandwidth limitations and time constraints. It may take several hours to gather and produce reports on the captured metrics. However, in today's adaptive enterprise systems, such information may be required in much shorter time periods.
Secondly, complexity is an issue considering the variety of tools that gather different types of data. For example, HP OVR collects performance data, the Domain Controller collects data related to Microsoft™ Windows™, and HP Asset collects asset data. Thus, a user needs to interface with multiple tools to collect different types of data, which makes gathering data center level metrics an even more difficult task. Many automated, centralized systems do not have the capability to automatically interface with multiple types of tools to capture and store the different types of data captured by each tool.