As the number of computers in corporate networks has grown, along with the number of operating systems and applications running on those computers, the time and effort required to monitor the status of these complex hardware/software systems has increased commensurately. Hardware devices such as disk drives and power supplies are prone to failure, environmental conditions such as overheating can result in erratic operation, and software on many layers has a well-deserved reputation for becoming non-responsive or “hanging.” Monitoring the status of a computer and taking action to recover from an abnormal condition is part of what is known as computer system management, and generally requires either human intervention by an administrator or, more recently, an intelligent software agent making decisions based on the same sort of inputs used by human administrators. If every computer had its own peripherals of monitor, keyboard, and mouse in order for system managers (human or automated) to determine its health and take corrective action when needed, the cost, space, and power requirements for a large number of computers could be enormous. In addition, the manpower requirements for operating such an inefficient architecture would add significant ongoing cost to the organization.
Hence, computer manufacturers early on began to incorporate devices and methods to make management of their equipment easier and more scalable. Software suppliers such as Microsoft also have developed and refined tools for this purpose. Because of the many potential failure modes and individual system administrators' preferences for how to deal with their configurations a plethora of these system management mechanisms exist, but generally they can be classified into two categories: in-band and out-of-band management. In-band management relies upon fully functional hardware, operating system, and network connection; it allows administrators in an enterprise environment to perform day-to-day operation of dozens to thousands of computers from one or more management consoles. Typical tasks performed with in-band management tools include user/account administration and reporting, backup and restore operations, usage measurement and analysis, application software loading and execution, operating system patch installation, etc. While routine health monitoring can be performed with in-band management tools, if one of the tools' underlying software or hardware components fails then that communication path may be rendered nonoperational. A separate, “out-of-band,” connection is required in order to diagnose the problem and attempt to return the system to a fully operational condition. The general area of the present invention is this out-of-band (“OOB”) management, and specifically relates to integrating a number of different out-of-band management techniques into one device.
Out-of-band management interfaces to computers are implemented in a number of different ways. If the computer's CPU and operating system are functional to some degree, one of the serial ports (COM ports) may be used to communicate with management services supported by a subset of the operating system. Microsoft's Emergency Management Services (“EMS”) is an example of this. Rather than using a full graphical user interface (GUI) operating over the network connection, which requires many different system resources to be functioning correctly, EMS uses a simple text command line via serial port to perform low-level diagnostic and corrective operations.
Another management technique which is becoming common is to embed a small service processor on the computer's main board. This service processor is independent of the main CPU and hence can continue to operate even when there is a hardware or software failure that disables the main CPU. The service processor (also known as a Baseboard Management Controller) can have a number of inputs for sensors monitoring conditions such as temperature and fan speed, and can have the ability to restart the main CPU or even cycle power to the computer. In order to ensure full operation of the service processor even when the main CPU and/or its primary network connection are down, the service processor typically has its own physical interface, either a serial port or a network (e.g. Ethernet) connection independent of the main network. The service processor may also share the main network connection in what is known as a “sideband,” if the hardware is designed for it. Communication with an embedded service processor has traditionally involved proprietary, manufacturer-specific protocols such as HP® Integrated Lights Out (iLO) or Sun® Advanced Lights Out Manager (ALOM), but standardization on a protocol known as Intelligent Platform Management Interface (IPMI) is underway. A third fundamental method of OOB management is known as digital KVM (Keyboard, Video, Mouse). With this, the computer's keyboard, mouse, and video monitor ports are used for communication independent of the main network connection, just as they do when a human being is sitting in front of those peripherals. A (hardware) device can capture the video output intended for a monitor, digitize it, and make it available over a network connection to a remotely located system administrator. The same device can emulate the keyboard and mouse signals (either in native PS/2 format or via Universal Serial Bus (USB)) that the computer expects to see on those inputs, allowing the remote administrator to interact with the computer just as if he had a physically connected keyboard, mouse, and monitor. Such KVM devices are commercially available, either with individual computer connections or combined with an analog KVM switch so that multiple computer KVM interfaces may share one network connection.
To summarize, out-of-band system management may take place via three types of hardware connection: network, serial, or KVM. Differing levels of management software communicate over one of these hardware interfaces either with the main CPU or an independent service processor. FIG. 1 shows the various connections and their relationship to system operational state. In locations with a large number of computers, integrating these disparate interfaces can require a significant management infrastructure of cabling and equipment. Ethernet switches are used to aggregate network connections; each computer using a management LAN connection (almost always Ethernet) will have cables running from centralized data center switches to both the main LAN port (for application communication with the outside world) and to the dedicated management LAN port. If serial OOB management is to be used, terminal concentrators (also known as terminal servers or console servers) combine a number of serial connections and make them available remotely via a network connection. This requires another cable from the terminal concentrator to the computer's COM port or dedicated serial management port. For KVM management, the situation is even more complicated due to the limited distance that the keyboard, mouse, and video signals can travel. A transceiver unit (about the size of a small cell phone) connects via short cables to the computer's keyboard, mouse, and video ports. The transceiver contains circuitry that extends these connections over a distance of cable ranging from 10 m to 100 m, depending on manufacturer, to a KVM switch chassis. This KVM switch aggregates a number of these connections and makes access to them available over a network connection.
Hence, to cover all OOB management interface possibilities, each managed computer requires three separate cables, connecting to four types of equipment: Ethernet switch, terminal concentrator, KVM transceiver dongle, and KVM switch. FIG. 2 illustrates this prior art configuration. Outside of the transceiver, all of these units are chasses which take up rack space and require AC power connections in data centers where both are at a premium. And in large data centers, keeping track of all of the cables can be a significant task in and of itself. On a hardware level, a device that reduces the number of equipment types by aggregating all of the various computer system management connections onto one administrative user connection would have considerable value over prior art in solving these problems. Ideally, the device would be compact in size, not requiring rack space, and low in power consumption.
Once the hardware connections are taken care of, the actual system management operations needing to performed by administrators can be time-consuming and hence expensive. Typical management functions are described using the acronym MILARRS in U.S. patent application Ser. No. 11/031,643 entitled “MILARRS Systems and Methods,” filed Jan. 7, 2005, the entire disclosure of which is incorporated herein by reference. MILARRS stands for Monitoring the state of the system for an administrator; Inventory the system's sub-systems, components, or assets; Logging data or events generated by the system; Alerting an administrator of system state or taking action based defined rules; Recovering the system if it fails or shuts down; Reporting device information or diagnostics to an administrator; and Securing the system and its assets from threats and risks. This and all other referenced extrinsic materials are incorporated herein by reference in their entirety. Where a definition or use of a term in an incorporated reference is inconsistent or contrary to the definition of that term provided herein, the definition of that term provided herein applies and the definition of that term in the reference does not apply.
The mix of prior art management equipment described above—network switches, terminal concentrators, KVM switches, etc.—can perform some of these functions, but each does so in a different way. Consider as a typical example a server where day-to-day management activity takes place via a KVM switch, with an administrator running a KVM client application on their management console. If the operating system should appear nonresponsive, the administrator would need to run a “telnet” application to try to communicate with the server via serial port connected to a terminal concentrator chassis. And as a last resort, running an IPMI client application could be needed to remotely reset power to the computer. Keeping track of which computer is being managed by what administrative application using which interface chassis at what network address and port number can be enormously time-consuming and expensive, particularly when configurations are changing rapidly. And this heterogeneous and ever-changing environment makes it difficult to automate the computer system management process by using software agents to reduce personnel costs.
A device which integrates all of the various connections required for computer system management could have a single network address, through which any management operation could be performed—no sorting out an intermediate layer of multiple single-interface chasses, each of which carries its own address. Through that single address, a common application interface could be presented to the remotely connected human or mechanized system administrators. With an embedded processing element, the device could also locally automate various management functions, reducing the workload on those administration resources.