A distributed computer system may perform parallel computing by the simultaneous use of multiple nodes to execute a computational assignment referred to as a job. Each node may include one or more processors, memory, an operating system, and one or more input/output (I/O) components. The nodes may communicate with each other through a high speed network fabric, e.g., an Ethernet, an Omni-Path, an InfiniBand, or other network, and may use shared file systems or storage. The job may be divided into thousands of parallel tasks distributed over thousands of nodes. These nodes may synchronize with each other hundreds of times a second.
Future distributed computer systems are projected to require tens of megawatts of power, making their power management a foremost concern in the industry. These distributed computer systems will be expected to deliver exascale performance with limited power and energy budgets. Current distributed computer systems may apply power capping to adhere to the limited power and energy budgets. However, current approaches to power capping negatively impact the performance of the distributed computer systems due to typically inaccurate power capping.
Current approaches estimate the power needed by one or more nodes of a distributed computer system to run a job based upon the thermal dissipation power (TDP) value of the one or more components comprising each node. As it is rare that a job actually uses the aggregate TDP value of each component of each node on which the job is run, the estimation using the aggregate TDP value results in an inaccurate estimate. By, for example, over-estimating the power needed to startup and run a job, current approaches may delay the start of the job and reduce the efficiency of the distributed computer system by preventing other jobs from running.
The start of running a job is delayed as the over-estimation of the necessary power to start the job causes the distributed computer system to delay the start of the job until the over-estimated startup power is available. Alternatively, a more accurate estimation of the startup power would avoid a delay of running the job. In addition, the over-estimation of the power required to run the job results in an over-allocation of power for the job. The over-allocation takes away from power that could be allocated to other jobs requesting to be run by the distributed computer system.
In addition, the aggregate TDP value is not the maximum power that may be consumed by a node. For example, the aggregate TDP value does not accurately measure the electrical power consumption when every component of the node is being used but measures the thermal dissipation. Therefore, it is possible that a job request may consume more power than the estimate derived from the aggregate TDP value which may lead to the distributed computer system attempting to consume more power than it has been allocated by a utility facility.
The TDP values of each component, and hence the aggregate TDP value, is also prone to inaccuracies due to a lack of uniformity between the compositions of each component on a node-to-node basis. For example, the actual thermal dissipation for a specific component in each node may vary between, for example, 70 W and 100 W. The publicized TDP value for the specific component may be, for example, 85 W. Therefore, an aggregate TDP may be inaccurate up to, for example, 15% for each component on each node resulting in a highly inaccurate power consumption estimate.
Additionally using current approaches, in order to run a job, a job manager may select a frequency at which to operate one or more nodes running the job. The frequency is typically selected based on an estimate of power consumption by the job. An over-estimate of power consumption leads to the selection of a first frequency. A more accurate estimation of the power consumption would result in the selection of a second frequency, the second frequency being higher than the first frequency; therefore. resulting in a shorter run-time.