1. Field of the Invention
This invention relates to fault-tolerant computer systems and more particularto a dedicated maintenance bus for use with such computer systems.
2. Background Information
Fault-tolerant computer systems are employed in situations and environments that demand high reliability and minimal downtime. Such computer systems may be employed in the tracking of financial markets, the control and routing of telecommunications and in other mission-critical functions such as air traffic control.
A common technique for incorporating fault-tolerance into a computer system is to provide a degree of redundancy to various components. In other words, important components are often paired with one or more backup components of the same type. As such, two or more components may operate in a so-called lockstep mode in which each component performs the same task at the same time, while only one is typically called upon for delivery of information. Where data collisions, race conditions and other complications may limit the use of lockstep architecture, redundant components may be employed in failover mode. In failover mode, one component is selected as a primary component that operates under normal circumstances. If a failure in the primary component is detected, then the primary component is bypassed and the secondary (or tertiary) redundant component is brought on line. A variety of initialization and switchover techniques are employed to make a transition from one component to another during runtime of the computer system. A primary goal of these techniques is to minimize downtime and corresponding loss of function and/or data.
Fault-tolerant computer systems are often costly to implement since many commercially available components are not specifically designed for use in redundant systems. It is desirable to adapt conventional components and their built-in architecture whenever possible. All modem computer systems have particular capabilities directed to control and monitoring of functions. For example, large microprocessor chips such as the Pentium III(trademark), available from Intel Corporation of Santa Clara, Calif., are designed to operate within a specific temperature range that is monitored by a commercially availble environmental/temperature-sensing chip. One technique for interconnecting such an environmental monitor or other monitoring and control devices is to utilize a dedicated maintenance bus. The maintenance bus is typically separate system""s main data and control bus structure. The maintenance bus generally connects to a single, centralized point of control, often implemented as a peripheral component interconnect (PCI) device.
However, as discussed above, conventional maintenance bus architecture is not specifically designed for redundant operation. Accordingly, prior fault-tolerant systems have utilized a customized architecture for transmitting monitor and control signals over the system""s main buses (or dedicated proprietary buses) using, for example, a series of application specific integrated circuits (ASICs) mounted on each circuit board being monitored. To take advantage of current, commercially available maintenance bus architecture in a fault tolerant computing environment, a more comprehensive and costeffective approach is needed.
Accordingly, it is an object of this invention to provide maintenance bus architecture having a high degree of fault-tolerance. This maintenance bus architecture should be interoperable with commercially available components and should allow a fairly high degree of versatility in terms of monitoring and control of important computer system components.
This invention overcomes the disadvantages of the prior art by providing a fault-tolerant maintenance bus architecture that includes two maintenance buses interconnecting each of a plurality of printed circuit boards, termed xe2x80x9cparentxe2x80x9d circuit boards. The two maintenance buses are each connected to a pair of system management modules (SMMs) that are configured to perform a variety of maintenance bus activities. The SMM can comprise any acceptable device for driving commands on the maintenance bus arrangement. Within each parent board are a pair of redundant bridges both having a unique address. One bridge is connected to the first maintenance bus while a second bridge is connected to the second maintenance bus of the pair. A child maintenance bus interconnects the two bridges through a xe2x80x9cchildxe2x80x9d printed circuit board. The introduction of a separate board to implement the child maintenance bus can be useful, but is not essential according to this invention. The child maintenance bus is itself interconnected with a variety of monitor and control functions on maintenance bus-compatible subsystem components. The SMMs can address components on each child printed circuit board individually and receive appropriate responses therefrom. In the event of a bus or bridge failure, the SMM can still communicate with the child subsystem components via the redundant bus and bridge.
The bridge can include an interconnection to a further bridge. This remote bridge can, itself, be interconnected to additional microprocessors and associated memory. The remote bridge is addressed through one of the parent board""s bridges so the communication to and from the SMM can occur. The SMM can be interconnected with a variety of other computer system peripherals and components, and can be accessed over a local network or through an Internet-based communication network.