Systems providing services such as, for example cloud services, often employ hundreds of thousands of servers to provide those services. Many servers are used for specific types of workloads or tasks. Depending on the tasks, power performance tradeoffs may exist. These systems include High Performance Computing (HPC).
Server node density is increasing dramatically now and in the foreseeable future. In many designs, multiple nodes share common power supplies and are placed on one blade. Managing power for such nodes and blade servers is a key factor that affects nodes density and cost.
Since power management becomes essential for server systems, it is necessary to get the power of every node in the system for the system and job power monitoring purpose. On most server nodes, there is a master controller or co-processor which is responsible for servicing power queries. For example, a BMC (Board Management Controller) is widely used by many Original Equipment Manufacturer (OEMs). Due to the nature of slow power sampling and averaging, power query via a BMC is slow, with the latency ranging from several milliseconds to hundreds of milliseconds. Before the BMC finishes its current transaction, it cannot service a new requester. Because the latency is long, on today's HPC systems, a BMC is often flooded. Requests that arrive when the BMC is busy are returned with an error “BMC being busy”. This type of problem is referred to herein as a “denial of service”.
Obviously, too many such errors result in a node power samples loss. For server systems running applications whose power varies over time, it may result in inaccuracy of power monitoring. Power control actions (e.g., setting new power caps per node, per rack or per job) usually depend on current and historical power consumption, so this inaccuracy could lead to a wrong power control action.
Some management software such as Data Center Manager (DCM) uses the BMC's return messages not only for getting power readings, but also for detection of whether the node is online and in a responsive state. If it often receives denial of service, this software will think the node cannot be monitored and/or power controlled and will wait for fairly long time to retry. This results in performance loss.
Denial-of-service can also cause performance throttling. For example, if power management software wants to set high frequency (e.g., P-state P0 or P1) on a particular node, but that node's BMC declined to service that request because it was busy servicing other requests, then the node might be forced to run at a lower p-state (e.g., P8/P9) and suffer performance loss.
Reliability, Availability, and Serviceability (RAS) is also a popular capability on a large group of servers. A denial of service could result in incomplete data in a RAS database. During Open Resilient Cluster Manager (ORCM) RAS validation, co-existence of two software instances that both need access Intelligent Platform Management Interface (IPMI) sensors via a BMC were found to experience this problem.