The emergence of the cloud for computing applications has increased the demand for off-site installations, known as data centers, that store data and run applications accessed by remotely connected computer device users. Such data centers typically have massive numbers of servers, switches, and storage devices to store and manage data, so they may be accessed in a convenient manner by remote computer users. A typical data center has physical rack structures with attendant power and communication connections. The racks are arranged in rows throughout the room or rooms of the data center. Each rack includes a frame that has horizontally oriented slots or a chassis that may hold multiple devices such as servers, switches and storage devices. There are many such devices stacked in such rack structures found in a modern data center. For example, some data centers have tens of thousands of servers, and attendant storage devices, and network switches. Thus, a typical data center may include tens of thousands, or even hundreds of thousands, of devices in hundreds or thousands of individual racks. Data centers typically have an administrative system in a control center to monitor and insure proper operation of the equipment. For efficient management, an administrator relies on instantaneous knowledge of the status of the equipment in each of the racks in the data center.
A typical rack system 10 is shown in FIG. 1. The rack system 10 includes a chassis management controller (CMC) 12 that is coupled to a power shelf 14. The power shelf 14 includes a number of power supply units and corresponding fans for cooling. The power shelf 14 is connected to network device banks 16 and 18, and provides power to each of the network devices in the banks 16 and 18. In this example, the network devices in the network device banks 16 and 18 may be blade servers. The rack system 10 also includes a management switch 20 and two data switches 22 and 24. The management switch 20 monitors the operation of the network devices stored in the banks 16 and 18, including power supplies on the power shelf 14. The data switches 22 and 24 provide data communication between the network devices stored in the banks 16 and 18.
The CMC 12 manages the network devices in the banks of network devices 16 and 18. The CMC 12 performs configuration and monitoring tasks, controls power to the network devices in the banks 16 and 18; and provides alerts for the network devices in the banks 16 and 18. The CMC 12 has a microprocessor and memory. The CMC 12 is powered by the power supplies on the power shelf 14.
The CMC 12 plays an important role in management of the rack 10 in a data center. The CMC 12 collects rack based operation information from the rack 10 for data center management software. The operation information may include: (1) rack total power consumption; (2) an IP address list from each of the network devices; (3) overall and individual rack component health status; and (4) rack based fan and thermal status. Such information may be used by data center management software to perform functions, such as capping rack power or updating rack IDs, to all management units of the rack, such as the CMC 12 or a rack management controller on the management switch 20. Thus, a data center administrator may remotely monitor the performance of the devices in each rack system in the data center by the collection of such information.
There are some well-known ways to implement the communication with the components on the power shelf 14 for the CMC 12. For example, the CMC 12 may use an inter-integrated circuit protocol (I2C); a universal asynchronous receiver/transmitter (UART); or a network to communicate operational data from the power shelf 14 with the management switch 20.
In operation, the CMC 12 periodically polls status of the power shelf 14. A remote administrator may send a command to get the power shelf status through a management switch such as the management switch 20. The CMC 12 then reports the status of the power shelf 14 according to the request from the remote administrator. When an alert is activated, the CMC 12 will actively report the status of the power shelf 14 to the remote management node to enhance the response to a potential situation such as a potential failure.
Generally, there is only one CMC for each canister, or rack such as rack 10, in FIG. 1. However, a remote administrator of a data center loses the ability to conduct status monitoring of the canister or rack power shelf 14 if the CMC 12 fails or goes off line. This impedes the operation of the rack 10 since problems will not be reported from the overall power and support provided from the rack 10.
Thus, the failure of the CMC 12 will impede the operation of all of the network devices in the rack 10. In a normal condition, the CMC 12 inside the canister or rack 10 will monitor the status of the power shelf 14 and report the status to a management node. The failure or malfunction of the CMC 12 will prevent monitoring of the network devices, and therefore operation must be halted until the unit may be inspected. Some current rack designs include a backup CMC to report the canister or rack status to a remote administrator should the first CMC fail. However, this is not a perfect solution because both CMCs may be cutoff from communication from the remote administrator. Further, if both CMCs on the canister fail, operation must be halted.
There is therefore a need for a rack system that allows remote monitoring of the support shelf status, even if a CMC goes off line. There is a further need for a mechanism to allow the operation of a rack, even if a CMC fails to report monitoring data of the devices on the rack. There is a further need for a rack system with multiple CMCs that collect status monitoring data for multiple devices in a rack.