Information Technology (IT) managers and Chief Information Officers (CIOs) are under tremendous pressure to reduce capital and operating expenses without decreasing capacity. The pressure is driving IT management to provide computing resources that more efficiently utilize all infrastructure resources. To meet this objective, aspects of the following questions are often addressed: How to better manage server utilization; how to cope with smaller IT staff levels; how to better utilize floor space; and how to handle power issues.
Typically, a company's IT infrastructure is centered around computer servers that are linked together via various types of networks, such as private local area networks (LANs) and private and public wide area networks (WANs). The servers are used to deploy various applications and to manage data storage and transactional processes. Generally, these servers will include stand-alone servers and/or higher density rack-mounted servers, such as 2U and 1U servers.
Recently, a new server configuration has been introduced that provides unprecedented server density and economic scalability. This server configuration is known as a “blade server.” A blade server employs a plurality of closely-spaced “server blades” (blades) disposed in a common chassis to deliver high-density computing functionality. Each blade provides a complete computing platform, including one or more processors, memory, network connection, and disk storage integrated on a single system board. Meanwhile, other components, such as power supplies and fans, are shared among the blades in a given chassis and/or rack. This provides a significant reduction in capital equipment costs when compared to conventional rack-mounted servers.
Generally, blade servers are targeted towards two markets: high density server environments under which individual blades handle independent tasks, such as web hosting; and scaled computer cluster environments. A scalable compute cluster (SCC) is a group of two or more computer systems, also known as compute nodes, configured to work together to perform computational-intensive tasks. By configuring multiple nodes to work together to perform a computational task, the task can be completed much more quickly than if a single system performed the tasks. In theory, the more nodes that are applied to a task, the quicker the task can be completed. In reality, the number of nodes that can effectively be used to complete the task is dependent on the application used.
A typical SCC is built using Intel®-based servers running the Linux operating system and cluster infrastructure software. These servers are often referred to as commodity off the shelf (COTS) servers. They are connected through a network to form the cluster. An SCC normally needs anywhere from tens to hundreds of servers to be effective at performing computational-intensive tasks. Fulfilling this need to group a large number of servers in one location to form a cluster is a perfect fit for a blade server. The blade server chassis design and architecture provides the ability to place a massive amount of computer horsepower in a single location. Furthermore, the built-in networking and switching capabilities of the blade server architecture enables individual blades to be added or removed, enabling optimal scaling for a given tasks. With such flexibility, blade server-based SCC's provides a cost-effective alternative to other infrastructure for performing computational tasks, such as supercomputers.
As discussed above, each blade in a blade server is enabled to provide full platform functionality, thus being able to operate independent of other blades in the server. Within this context, many blades employ modern power management schemes that are effectuated through built-in firmware and/or an operating system running on the blade platform. While this allows for generally effective power management on an individual blade basis, it doesn't consider the overall power management considerations applicable to the entire blade server. As a result, a blade server may need to be configured to handle a worst case power consumption condition, whereby the input power would need to meet or exceed a maximum continuous power rating for each blade times the maximum number of blades that could reside within the server chassis, the rack tower, of even in a room full of towers. Other power-management considerations concern power system component failures, such as a failed power supply or a failed cooling fan. Under current architectures, there is no scheme that enables efficient server-wide management of power consumption.