Datacenters are indispensable in the modern information technology (IT) landscape. They are deployed all over the world to host computing services and data storage. The energy consumption of datacenters is increasing dramatically due to the rapid expansion of datacenters in both number and scale. Energy expense is one of the most significant operating costs of datacenters. Companies like Amazon, Google, IBM, Microsoft, and Facebook pay millions of dollars every year for electricity. To minimize cost due to energy consumption, power management has become an important consideration when building and sustaining the operation of every datacenter. One essential key to effective power management is fine-grained power monitoring.
In datacenter operation, fine-grained power monitoring refers to power monitoring at the server level. It facilitates the implementation of various power management strategies, such as power capping and accounting, idle power elimination, cooling control, and load balancing. A fine-grained power monitoring platform can help audit the total energy use of the datacenter, and continuously show the real-time server-level power consumption. Such a platform greatly helps the datacenter operators to adjust power management policies and explore potential benefits. In cooling control for example, the real-time feedback of server-level power distribution can be used to provide leading information to locate thermal “hot spots” (i.e., server input air condition is too hot) that hamper the efficiency of the datacenter, and to define appropriate corrective action to optimize the air flow in the datacenter. Moreover, fine-grained power monitoring is also critical in the safe operation of datacenters. For example, the maximum power capacity of the datacenter may be quickly reached upon continuous scaling-out (i.e., adding computing resources) and scaling-up (i.e., upgrading IT facilities). Based on one survey, approximately 30% of enterprise datacenters could run out of the power capacity within 12 months. Accordingly, datacenter operators are faced with the dilemma of limited power capacity and increased power demand. That dilemma can be further magnified by the so-called “overbooking” practice, wherein the datacenter operators tend to overbook the power infrastructure for a high percentile of their needs. This practice of overbooking is based on the general knowledge that the nameplate power rating of a server is overprovisioned, and is therefore lower than its actual peak power, giving certain confidence that an extra number of servers can be added and supported within the power capacity of the datacenter. Unfortunately, overbooking can cause power deficits at some levels of the IT facilities and, in a worse case, an overrun or a system crash at a higher level can occur when power usage exceeds power capacity. Fine-grained power monitoring can help prevent the aftermath of this unsafe practice of overbooking. However, one major challenge in fine-grained power monitoring is that not all types of servers in the datacenter are equipped with power sensors. This holds true especially when a datacenter uses a diverse set of legacy servers, high-density blade servers, and enclosures. The DELL POWEREDGE M100e and the IBM BLADECENTER HHPE PROLIANT DL380 series are examples of widely used blade servers not equipped with power sensors. To monitor their power usage, power meters are typically installed at power distribution units (PDU) or at the rack-level. Power monitoring in this case, however, is not fine-grained.
In general, power monitoring solutions can be organized into two categories: hardware-based power monitoring and software-based power monitoring. Metered rack PDUs, intelligent power strips, and power clamps are examples that belong to the hardware-based power monitoring solution category. Metered rack PDUs can provide rack-level power monitoring (i.e., not server-level power monitoring), wherein the aggregate load on the circuit is monitored. Some intelligent power strips can provide indications of electrical load or power drawn by every outlet connected to a computing device. Power clamps can facilitate the manual measurement of power drawn by an individual server, but the manual method associated with this instrument cannot provide real-time power monitoring when large numbers of servers are involved. In addition, these hardware-based solutions require additional costs associated with purchasing, installation, and maintenance. If a large number of servers is involved, integrating hardware-based solutions can also cause space constraints within the datacenter facility.
On the other hand, software-based power monitoring solutions are typically more cost-effective compared to their hardware-based counterparts. In a software-based solution, power models can be used to estimate the power consumption of a server using information collected at a server level, a component-level, and/or an application-level. Power models can be trained based on a correlation between a state or utilization of a hardware component and a power consumption of the hardware component.
For example, Gatts and Yellick, U.S. Pat. No. 9,020,770 (Gatts) teaches a computer-usable program product and data processing system that uses a power estimation model, which correlates one type of factor at a time, to correlate with the power consumption of a particular server in a datacenter. Such a factor can be the processor utilization, memory utilization, network throughput, I/O rate, temperature or heat output, or fan noise or speed. For clarity, the prior art shows that processor utilization alone can be used to correlate with the power consumption of a first server, memory utilization alone with the power consumption of a second server, I/O rate alone with the power consumption of a third server, and so on. FIG. 1 depicts a schematic diagram of the method of power estimation described in Gatts.
While it can work well on certain cases of the datacenter operation, the approach in Gatts can provide sub-optimal estimation of server-level power consumption in cases where multiple components within a server are simultaneously consuming significant power to support various tasks or workloads of the datacenter. To illustrate, a first server may draw significant power for both its central processing unit (CPU) and graphics processing unit (GPU) only to undertake one task, while a second server may draw significant power for its CPU, memory and storage disk to undertake a different kind of task.
Lastly, current software-based solutions require power model training, but certain methods in this category require power measuring at the server-level or a lower-level during an initial training phase, even if no hardware-based power measuring is needed afterwards. Hardware-based power measuring during the initial training phase makes such methods intrusive.
Therefore, given the limitations and challenges associated with previous hardware-based and software-based solutions, there exists a need for a better approach, that is low-cost and non-intrusive, to facilitate real-time fine-grained power monitoring of datacenters.