1. Field of the Invention
The present invention generally relates to a system for remotely managing a computer system or network. More particularly, the present invention relates to a remote management system that includes the capability of transferring a bootable image from a remote computer system to a system management board in a host computer system, and for causing the host computer system to boot the image during its boot cycle.
2. Background of the Invention
The concept of controlling a computer from a remote terminal is of great interest in many applications, especially in computer networks and in other situations where a user may not be able to physically access the computer. Computer networks such as LANs (local area networks) have become one of the most important devices for storing and sharing data in a business, and thus, computer networks are one of the most critical pieces of equipment in a business office. A failure in the computer network can cause business operations to grind to a halt. Computer networks typically comprise a plurality of personal computers and other data processing devices connected together for information exchange. At the heart of the computer network is one or more file servers. In most computer networks, file servers are responsible for administrating and storing the documents generated by each of the personal computers (PCs) in the system. In addition to managing the network, file servers also preferably include the capability to monitor faults in the computer network. If a fault or security breach is detected, the file server provides a warning of the fault and in certain instances may also provide diagnostic operations and may even implement corrective measures.
Files and data are maintained by a host processing system within the server. Servers are designed to provide work stations with fast access to files stored by the server. Accordingly, file servers embody a computer which responds to an operating system program (a popular operating system being, for example, WINDOWS NT®, or NETWARE®) to not only orchestrate the files but also to maintain file security, file backup, or other file management features. One important aspect which flows from maintaining these functions within a server is the capability to manage the server from a remote site, and to even permit management of the server from sites remote from the network. There has been a steady increase in the number of servers that are used in businesses. The trend is to place one or more servers at each location of a business, rather than using a single main frame computer at a centralized location. Typically, a company has an individual or department responsible for administering all of the file servers. In many instances, the administrator or administration department is headquartered at one site. Thus, each of the servers must either be maintained and administrated remotely or else personnel must be transported to remote offices to permit on-site management.
Numerous monitoring systems are available to automatically alert designated persons when a PC, server or software application has failed. When such a failure occurs, the persons being notified may be in a remote location and not able to directly access the failed PC. In such an instance, the person may have access to a computer that is remotely connected to the failed computer or server, but may be unable to access the failed device. These may arise either because the processor in that device has failed and will no longer respond, the application running on the failed computer does not support remote PC access, or the failed computer not have the necessary software installed to permit a remote PC to access it. Various systems exist to permit a PC to be remotely or automatically rebooted, which in many cases restores the PC to normal operation. However, network administrators are reluctant to use such systems without first determining what may have caused the failure, so that similar failures can be prevented in the future.
Operating systems may permit access to the computer or server being managed from a remote site, often call a “remote terminal.” A remote terminal, while not physically connected to the computer or server, nonetheless allows remote control of certain operations. Products such as Compaq Server Manager® and Compaq Remote Insight Manager®, obtainable from Compaq Computer Corp., have attempted to address some of the issues involved in managing a network of distributed servers from a single, remote site. These products permit an administrator to be notified of a remote server failure, to reset the server from the remote site, and to access certain information provided on the server. Compaq's Insight Manager® permits remote maintenance of the file server as well as local and remote notification of errors. In addition, Insight Manager® permits the file server to be re-booted from a remote location or from any system on the network. Insight Manager® also provides control facilities including diagnostic capabilities to analyze the condition of the server system configuration and to update system firmware. Insight Manager® collects and monitors server data as well as data from each client in the network and allows the network manager to act on the data from a remote location or any work station on the network. In addition, Insight Manager® includes the capability to set user defined thresholds which permit the server to monitor system parameters and to alert the network manager when an error occurs. Notification in the event of an alert or a failure is delivered in many possible ways including on-screen messages, a pager, e-mail, fax and SNMP.
It is certainly beneficial to allow remote control of certain server functions, especially those needed to reset one or more servers within a network of servers. Downtime caused by server failure may be the most costly expense incurred in running a distributed computer system. The causes of server failure or “crash” are numerous. Any number of malfunctions or design flaws associated with the server hardware, server operating system or application programs running on a server may cause a server to crash. If a server crashes, then file access is often lost and business records are temporarily inaccessible until the cause of failure is fixed.
In certain instances, it may be necessary to install new software or boot code in a remotely-managed computer system. For example, if a system crashes and cannot be re-started, it may be necessary to boot the computer with boot code from a floppy disk. It may also be necessary to re-program the BIOS ROM or the flash ROM of the computer system if the ROM code becomes corrupted, or needs to be modified. It may also be necessary to install a new operating system from time-to-time, as upgrades and modifications are made to the operating system currently in use. These types of changes to the core programs in a computer system typically cannot be handled from a remote location.
Various software systems have been developed that link one PC with other PC's using modems connected over standard telephone lines either via direct connections or through the Internet. These systems permit a host PC to be controlled by a remote PC. Function keys and menus are used in these systems to permit the remote PC to operate the host PC, as if the remote user was physically sitting at the host PC. Some examples of computer packages that function in this manner are Carbon Copy, PC Anywhere, Remotely Possible, Timbuktu, and Citrix MetaFrame. In commonly assigned U.S. application Ser. No. 08/775,819, filed Dec. 31, 1996, entitled, “Diagnostic Board With System Video And Keyboard For Host Server System To Permit Remote Operations In A Terminal Mode,” a system management module for a host server system is disclosed that includes a system management processor connected to a system management local bus. The system management local bus connects to the system PCI bus through a system management central control unit. The system management module includes a video controller and/or keyboard and mouse controller connected to the system management local bus to support remote consoling of the system management module, even in the event that the system bus fails. Also, in commonly assigned U.S. application Ser. No. 09/544,573, filed Apr. 6, 2000, entitled “USB Virtual Devices,” a management sub-system connects to a managed server and emulates a USB device, so that a remote management console can operate as a USB virtual terminal. The teachings of U.S. application Ser. No. 08/775,819 and 09/544,573 are incorporated herein by reference.
While these systems permit the remote user to take control of the host computer, in the sense that the remote computer can transmit certain input signals such as keyboard and mouse commands, most do not permit the remote computer to load basic core software from the peripheral drives of the remote computer to the host computer, or permit the other peripheral devices that might be available on the remote computer to be used in operating the host computer. Thus, if a network administrator has operating system software or boot code to be loaded on the host computer, the administrator typically must gain physical access to the host computer. Because most of the various peripheral devices of the remote computer are essentially unavailable in the remote administration of a host computer, the network administrator must physically visit the host computer to perform many basic maintenance tasks that simply cannot be performed at this time remotely. Thus, for example, if new operating system software must be loaded on a managed server, a network administrator must physically travel to the site of that server. As a result, many companies that have network facilities at different locations must employ separate network administrators at each facility or else pay for transportation between the various facilities, thus greatly increasing the cost of administering the company's network.
One software tool that does exist for permitting software to be installed remotely is the Intel PXE (Preboot Execution Environment) Toolkit. The PXE Toolkit enables certain software (including some operating systems) to be installed remotely under certain conditions. For the installation to occur, however, the computer system receiving the installation must be operational because the software to be installed is temporarily stored in the random access memory of that computer. Thus, if the system has crashed, or is otherwise inoperable, this product has limited use. Consequently, if a system has crashed and cannot reboot, it is still necessary to have personnel travel to the computer site to perform repairs on site.
It would be desirable if a system was available that permitted software to be remotely installed on a computer system, even if that computer system had failed and could not be booted. In particular, it would be advantageous if a system was available that could be used to boot the system remotely using new or modified boot code. It would also be beneficial if the system permitted other software to be installed, or system devices to be re-programmed, all from the remote location. Despite the apparent advantages such a system would offer, to date no such system is available.