1. Field of the Invention
The present invention relates to the field of management modules for a server system and more particularly to management module redundancy for a server system.
2. Description of the Related Art
The data center has changed over time from a mainframe centric environment requiring dozens of skilled technologists to ensure the ongoing operation of the mainframe, to a complex environment of many different server computing platforms coupled to one another over sophisticated data communications networks. Initially a resource only available to the wealthiest of organizations, recent advances in the mass production of personal computers has provided access to data center technologies at a reasonable cost. Generally facilitated by a rack, the modern data center involves the arrangement of a multiplicity of servers in one or more racks coupled together according to conventional network protocols.
Addressing the unwieldy and unreliable nature of rack-mounted ordinary computers, blade server solutions have become pervasive in more sophisticated data centers. In the blade center environment, different computing platforms can be arranged into blades and coupled to one another across a mid-plane in a single chassis. The mid-plane can provide access to a unified power source, input output (I/O) devices and even removable media drives. In this way, the blades need not include or manage a power supply or commonly used drives within the blades themselves resulting in substantial power savings, a reduced footprint and overall lower total cost of ownership. Additionally, failover concerns can be met through the hot-swappable nature of the blades in the chassis.
Unlike the basic standalone server computing platform, an arrangement of servers in a data center environment—including blade server arrangements—presents a management challenge for information technologists. Each server in the data center environment can have its own configuration to support a unique blend of application components and thus, each configuration for each server must be managed carefully and remotely in many cases. Further, failover contingencies must be addressed within each server including notification of an impending or already occurrence of a system fault in a server.
The complexity of managing the configuration of an arrangement of servers in the data center has been addressed by way of the management module. A management module generally provides system monitoring, diagnostics, telemetry and other services for a given computing domain. That computing domain can be at the device, system, chassis or datacenter level, by way of example. A baseboard management controller (BMC) can cooperate with a management module as a built-in system component that provides basic monitoring and troubleshooting facilities for a host server, such as sending alerts and remote power control. The BMC is commonly associated with high-performance server and refers to a microcontroller configured for the out-of-band management of system fault handling. Modern BMC implementations include a configuration for scanning out all error registers during system failure before resetting the system. Some BMC implementations only are able to scan out chipset registers as processor registers for some central processing unit (CPU) models are not accessible. Other BMC implementations are able to scan out both chipset registers and processor registers.
Ironically, while management modules are charged with monitoring the health of a monitored server, management modules in of themselves are susceptible to failure. Configuring a management module for proper interoperation with a monitored server can be complicated and the loss of a management module can require a tedious manual reconfiguration of the management module. To address the possibility of a failure in a management module, at present, redundancy in management modules is provided within the chassis of a server. However, in space constrained environments like the blade server environment, hardware space comes at an expensive premium so that it is preferable to reduce components in a blade server environment to optimize total cost of ownership of the environment.