Very large data centers have been, and will continue to be, built to support a variety of applications such as Internet searching, social networking, and cloud computing. These very large data centers may include tens of thousands of devices such as computer devices, storage devices, switches, routers, management devices, and so on. Because the devices can be expensive, a data center operator typically sizes the data center to have only enough devices to meet the anticipated demand. If a data center operator overestimates the demand, then these expensive devices will remain idle. Conversely, if a data center operator underestimates the demand, then business and consequently revenue may be lost because devices will not be available to meet the demand.
To maximize revenue and minimize expenses, a data center operator, in addition to trying to accurately size the data center, would like as many of the devices as possible at any given time to be in service, that is, available to service the applications of the data center. Unfortunately, with a very large data center, a large number of devices may be out of service at any given time for a variety of reasons. For example, some devices may be out of service because of software upgrades (e.g., new operating system) for those devices. Other devices may be out of service because of hardware problems (e.g., defective graphics processing unit or defective memory).
A data center operator may size the data center anticipating that a certain percentage of the devices will be out of service at any given time. For example, if a data center has 100,000 devices with a failure rate of 10% per year, then 10,000 devices on average would need to be repaired (including repair by replacement) each year. The data center operator would need to factor in the average time to repair a device when sizing the data center. Unfortunately, the time from when a failure is identified and the device goes out of service until the device is back in service can be many days. The process of repairing such a device may involve the requesting and receiving of a returned merchandise authorization, the removing of the failed device from the configuration data of the data center, the preparation of a repair order, the dispatching of a technician, the uninstalling of the device, the diagnosis of the problem, the repair work, the installing of the repaired device, and the adding of the repaired device to the configuration data of the data center. The adding of the repaired device to the configuration of the data center can be especially time-consuming and error-prone. It can be time-consuming because the repair technician needs to manually convey information to a manager of the data center who is responsible for manually updating the configuration information. Because of work backlogs of the technicians and the managers, it can take several days from the completion of the repair until the repaired device is back in service. It can be error-prone because the device identifiers (e.g., 16 hexadecimal digits in length) need to be manually transcribed and entered.