1. Field of the Invention
The field of the invention relates to I/O adapters in computer systems. More particularly, the field of invention relates to the hot add and swap of adapters and input/output platforms on computer systems.
2. Description of the Related Technology
As enterprise-class servers, which are central computers in a network that manage common data, become more powerful and more capable, they are also becoming ever more sophisticated and complex. For many companies, these changes lead to concerns over server reliability and manageability, particularly in light of the increasingly critical role of server-based applications. While in the past many systems administrators were comfortable with all of the various components that made up a standards-based network server, today's generation of servers can appear as an incomprehensible, unmanageable black box. Without visibility into the underlying behavior of the system, the administrator must "fly blind." Too often, the only indicators the network manager has on the relative health of a particular server is whether or not it is running.
It is well-acknowledged that there is a lack of reliability and availability of most standards-based servers. Server downtime, resulting either from hardware or software faults or from regular maintenance, continues to be a significant problem. By one estimate, the cost of downtime in mission critical environments has risen to an annual total of $4.0 billion for U.S. businesses, with the average downtime event resulting in a $140 thousand loss in the retail industry and a $450 thousand loss in the securities industry. It has been reported that companies lose as much as $250 thousand in employee productivity for every 1% of computer downtime. With emerging Internet, intranet and collaborative applications taking on more essential business roles every day, the cost of network server downtime will continue to spiral upward.
A significant component of cost is hiring administration personnel. These costs decline dramatically when computer systems can be managed using a common set of tools, and where they don't require immediate attention when a failure occurs. Where a computer system can continue to operate even when components fail, and defer repair until a later time, administration costs become more manageable and predictable.
While hardware fault tolerance is an important element of an overall high availability architecture, it is only one piece of the puzzle. Studies show that a significant percentage of network server downtime is caused by transient faults in the I/O subsystem. These faults may be due, for example, to the device driver, the device firmware, or hardware which does not properly handle concurrent errors, and often causes servers to crash or hang. The result is hours of downtime per failure, while a system administrator discovers the failure, takes some action, and manually reboots the server. In many cases, data volumes on hard disk drives become corrupt and must be repaired when the volume is mounted. A dismount-and-mount cycle may result from the lack of "hot pluggability" or "hot plug" in current standards-based servers. Hot plug refers to the addition and swapping of peripheral adapters to an operational computer system. Diagnosing intermittent errors can be a frustrating and time-consuming process. For a system to deliver consistently high availability, it must be resilient to these types of faults.
In a typical PC-based server, upon the failure of an adapter, which is a printed circuit board containing microchips, the server must be powered down, the hot added adapter and adapter driver installed, the server powered back up and the operating system reconfigured.
However, various entities have tried to implement the hot plug of these adapters to a fault tolerant computer system. One significant difficulty in designing a hot plug system is protecting the circuitry contained on the adapter from being short-circuited when an adapter is added to a powered system. Typically, an adapter contains edge connectors which are located on one side of the printed circuit board. These edge connectors allow power to transfer from the system bus to the adapter, as well as supplying data paths between the bus and the adapter. These edge connectors fit into a slot on the bus on the computer system. A traditional hardware solution for "hot plug" systems includes increasing the length of at least one ground contact of the adapter, so that the ground contact on the edge connector is the first connector to contact the bus on insertion of the I/O adapter and the last connector to contact the bus on removal of the adapter. An example of such a solution is described in U.S. Pat. No.5,210,855 to Thomas M. Bartol.
U.S. Pat. No.5,579,491 to Jeffries discloses an alternative solution to the hot installation of I/O adapters. Here, each hotly installable adapter is configured with a user actuable initiator to request the hot removal of an adapter. The I/O adapter is first physically connected to a bus on the computer system. Subsequent to such connection a user toggles a switch on the I/O adapter which sends a signal to the bus controller. The signal indicates to the bus controller that the user has added an I/O adapter. The bus controller then alerts the user through a light emitting diode (LED) whether the adapter can be installed on the bus.
However, the invention disclosed in the Jeffries patent also contains several limitations. It requires the physical modification of the adapter to be hotly installed. Another limitation is that the Jeffries patent does not teach the hot addition of hot added adapter controllers or bus systems. Moreover, the Jeffries patent requires that before an I/O adapter is removed, another I/O adapter must either be free and spare or free and redundant. Therefore, if there was no free adapter, hot removal of an adapter is impossible until the user added another adapter to the computer system.
Hardware developers have recently created the Intelligent I/O (I.sub.2 O) architecture to facilitate the development of hot added adapters for servers. Traditionally, a computer has one processor for handling machine instructions. This processor is called the central processing unit (CPU). The I.sub.2 O architecture defines a hardware topology in which a secondary processor in addition to the usual CPU is provided for handling I/O transactions. The secondary processor is located on an input/output platform (IOP) which controls a plurality of I/O devices. The architecture also defines a split device driver model wherein an operating system module (OSM) with operating system code is located on the host CPU and a device driver module (DDM) is located on the IOP to control I/O devices. The OSM and the DDM communicate with each other through a messaging layer which is defined by the I.sub.2 O architecture. The split device driver architecture allows operating system vendors and hardware vendors to each provide only one module for each type of adapter. Thus, an OSM for a LAN adapter is compatible with all classes of LAN adapters, and similarly the DDM provided by the hardware vendor for the LAN device is compatible with all I.sub.2 O compliant operating systems.
Recently, the I.sub.2 O architecture has been amended to define some hot plug primitives by which an operating system can request the hot add and removal of a IOP or an adapter which is located on the IOP. However, the I.sub.2 O architecture has failed to define or describe the mechanism by which the configuration information of the added or swapped device is maintained. Moreover, the I.sub.2 O architecture fails to solve the problem of how devices are to be added under the Peripheral Component Interconnect (PCI) architecture. In the PCI architecture, a bus address, a set of memory addresses, and a set of I/O memory addresses for each memory device have to be configured for a particular range depending on the physical location of each of the devices. The bus number, memory address, and the I/O memory addresses are traditionally defined upon the start-up of the computer. If a PCI device is added at a subsequent time, the system must be able to allocate resources for the newly added device appropriate for the slot in which the device is located. However, traditional system initialization routines fail to reserve bus address, memory addresses, and I/O memory for the new device.
Further, current operating systems do not by themselves provide the support users need to hot add and swap an adapter under the I.sub.2 O architecture. System users need software that will freeze and resume the communications of their adapters in a controlled way. The software needs to support the hot add of various peripheral adapters such as mass storage and network adapters. Additionally, the software should support adapters that are designed for various bus systems such as Peripheral Component Interconnect, CardBus, Microchannel, Industrial Standard Architecture (ISA), and Extended ISA (EISA). System users also need software to support the hot add and swap of input/output platforms with embedded adapters.
A related technology, not to be confused with hot plug systems using the I.sub.2 O architecture, is Plug and Play defined by Microsoft and PC product vendors. Plug and Play is an architecture that facilitates the integration of PC hardware adapters to systems. Plug and Play adapters are able to identify themselves to the computer system after the user installs the adapter on the bus. Plug and Play adapters are also able to identify the hardware resources that they need for operation. Once this information is supplied to the operating system, the operating system can load the adapter drivers for the adapter that the user had added while the system was in a non-powered state. Plug and Play is used by both Windows 95 and Windows NT to configure adapter cards at boot-time. Plug and Play is also used by Windows 95 to configure devices in a docking station when a hot notebook computer is inserted into or removed from a docking station.
Therefore, a need exists for improvements in server management of I.sub.2 O devices which will result in continuous operation despite adapter failures. System users must be able to replace failed components, upgrade outdated components, and add new functionality, such as new network interfaces, disk interface adapters and storage, without impacting existing users. Additionally, system users need a process to hot add their legacy adapters on I.sub.2 O platforms, without purchasing hot added adapters that are specifically designed for hot plug. As system demands grow, organizations must frequently expand, or scale, their computing infrastructure, adding new processing power, memory, mass storage and network adapters. With demand for 24-hour access to critical, server-based information resources, planned system downtime for system service or expansion has become unacceptable.