Managing power consumption and power dissipation of server systems is a major issue for the server industry, particularly due to increasing energy costs and environmental hazards associated with energy consumption. Due to the need to support short-duration, higher-demand periods, many data centers deploy over-configured or higher-performance server systems. Many data centers are therefore underutilized much of the time and operate at much lower performance demands than those for which they are configured. Many data centers also deploy over-configured or higher-performance server systems to simplify procurement and deployment logistics by using a common higher-performance server configuration.
One particular type of system that can benefit from power savings includes large collections of similar devices such as server “farms.” Server farms can include tens, hundreds, thousands or more separate computer systems, or servers. The servers are usually interconnected by a common network and are provided with centralized control or management.
Some specific trends have recently emerged in enterprise data center. Users are moving to form factors that focus on higher density of functionality—such as dense computing, dense storage and/or dense input/output (IO). Increasingly, chassis tend to have multiple server nodes, or IO nodes or storage controller nodes. These nodes are either general compute nodes or targeted for specific workloads such as storage, machine learning, artificial intelligence or big data. The nodes may be equipped with general purpose CPU's, such as Intel x86 CPUs, or special purpose field-programmable gate arrays (FPGA) for machine learning or artificial intelligence, or with non-volatile memory for in-memory big data solutions.
Power management on a chassis with multiple compute nodes is more complicated than on a chassis with single compute nodes. This is especially true for special purpose chassis systems like storage servers (example UCS C3260). Such special purpose chassis can be optimized for specific types of workloads, and power management solutions have to cognizant of this factor when making algorithmic decisions for power management. The algorithms have to be sensitive to workload types as well as the underlying node level and chassis level architecture for the workflow of the load. For example, a chassis designed to have multiple plug-in FPGA cards, designed for machine learning workloads, may not show a high CPU workload. Yet, such a chassis will draw a significant amount of power and will require power management. In some cases, the entities drawing the bulk of the power may not even have intrinsic power management capabilities. In such cases, the chassis level algorithm may be configured to incorporate these elements into an overall feedback loop.
In these systems, the nodes may or may not behave independently. This is different from a standard blade server system, where multiple compute nodes inhabit the same chassis with the point of sharing the same infrastructure, such as fans and power, without having the intent of forming any logical functional block using the underlying hardware. In a special purpose chassis system, like a storage server, this is not true. Apart from sharing common infrastructure such as power and cooling, they also share access to storage and input/output (IO) blocks. This means, they can either operate as independent nodes or as combined nodes, sharing the workload and IO using some mutually agreed algorithm, either in shared or failover mode.
The nature of such a special purpose chassis means that any kind of power management solution needs to incorporate this inherent complexity in its design. The power management solution needs to understand the way workloads can be impacted due to the power management implementation.