The present invention relates to computer networks and more particularly to network devices that are accessible over a network connection during reboot or power loss.
As computer network technology and security threats evolve, it becomes very important to optimize both network operation and security. This challenge often calls for remote network management strategies that include both in-band and out-of-band network management tools. In-band management tools are widely used and typically employ either a telnet connection to a network device, such as a server or a router, or management tools based on the Simple Network Management Protocol (SNMP). SNMP-based tools include real-time fault detection tools that detect, log, notify users of, and automatically correct network faults. Because faults can cause downtime or unacceptable network degradation, fault management is perhaps the most widely implemented tool.
In-band network management is the most common way to manage a network. However, when a network router malfunctions, by way of example, traffic cannot flow through the network. This creates a problem because the in-band management tools cannot be used to determine the source of the fault or to correct the fault. To address this problem, most mission-critical networks also include the out-of-band network, which is an alternate path to reach each network device for diagnostic purposes even when the in-band network is down.
A data communication network (DCN) is commonly used to implement the out-of-band management of networked devices. The DCN is often referred to as the out-of band network because it is not used for ‘transmit data’ services. Rather, out-of-band management provides the network administrator the ability to manage the network in parallel with the in-band management tools and data traffic. As is well understood in the art, network administrators can utilize out-of-band network tools to facilitate remote installation, updates, and upgrades to the operating system, BIOS and any software operating on each networked device. Further, the out-of-band network tools enable the network administrator to isolate the cause and recover the failure, or reduce the impact of a network failure. As a further advantage, management-related network traffic is moved to the out-of-band network so that data transmit services can fully utilize the available bandwidth of the in-band network.
Unfortunately, even out-of-band management tools are useless when the network administrator is trying to bring a networked device back on-line after it has been shut down. This problem arises because when a network device is being booted, the device is non-functional during the boot process even if the out-of-band network is otherwise available. Thus, the system administrator cannot use the fault manager tool to monitor or collect data from the device during boot-up to help pinpoint the source of the error. This lack of visibility during the boot process is particularly troublesome when the error affects a device located at a remote site because a technician must be dispatched to diagnose the error on-site. Clearly, what is needed is the ability to monitor network devices during the boot process so that the network administrator can remotely diagnose any boot errors and get the device back on-line.
Monitoring the boot process is especially critical whenever the operating system or software is updated because the network device must often be rebooted to start executing the new code. If the installed update was defective, the device may be unable to boot properly or may not be operable thereby rendering both in-band and prior art out-of-band management tools ineffective. Clearly, it is desirable to provide network administrators the ability to monitor the reboot process in order to verify that the new update was correctly installed and that the network device functions correctly. Alternatively, if the software is defective, it would desirable to enable the network administrator to gain control of the device and uninstall the software even if the network device cannot be fully booted.
In other situations, the network device may lose power and go off-line. When power to a network device is lost, it is desirable that the network administrator be able to quickly ascertain the cause for the device going offline. With existing management systems however, the loss of power will take the device off both the in-band network as well as the out-of-band network and the administrator will not have the tools to determine the source of the problem other than to dispatch a technician to the site. It is obviously desirable to provide the network administrator the ability to remotely correct a malfunction rather than to incur the time and expense associated with dispatching a technician.
While network management is critical to maintaining the operation of the DCN, there are times when critical devices will fail. In such cases, it is necessary to swap out the defective device and replace it with a functional device. Accordingly, it is necessary to store spare devices in widely dispersed geographic areas so that spares are readily available. Unfortunately, many network devices are very expensive so there is a great need to carefully manage the inventory of spare devices to ensure their availability in the event of a network failure. For this reason, many enterprises will store several spare servers, routers and switches in a locked room or cage for use as replacement parts. Although the system administrator may count the spare devices on a periodic basis, the count may only be accurate at the time it was taken because expensive network devices are often theft targets. Thus, the most recent inventory count may be grossly inaccurate if a theft of the spare devices has not yet been detected. To combat the constant theft problem, it is desirable to maintain a real-time inventory of the spare devices and to constantly monitor the availability of the stored spare devices. In this manner, if a network error is traced to a router or a switch, the system administrator is assured that the necessary spare equipment will be available to fix the problem.