1. Field of the Present Invention
The present invention generally relates to the field of data processing systems and more particularly to a server system employing redundant service processors and to a method for managing the service processors when one fails.
2. History of Related Art
The concept of a service processor is well know in the field of data processing systems and particularly server class systems. Service processors are provided to manage certain aspects of a server and can be defined loosely as an auxiliary processor that monitors the environment and health of one or more main processors and their associated subsystems. In the event of an actual or predicted problem with a main processor, subsystem, or the environment, the service processor is capable of taking action to alert a system administrator and/or to correct the problem on its own. Service processors have been used on large systems such as IBM mainframes for decades and are now well established as a management tool for industry standard class servers including servers based on x86 processors such as the Pentium® family of processors from Intel.
Redundant service processors may be provided in high-availability systems so that a failure of one service processor does not result in the loss of the monitoring, alerting, and initialization capabilities that are imperative in such systems. Implementing redundant service processors is complicated for a number of reasons. A method by which the service processors agree which one is in control must be defined and subscribed to by both service processors. All subsystems, including the service processors themselves, must understand which service processor is in control at any given point in time. In some instances, hardware must be provided to switch busses from one service processor to the other and this hardware must switch the busses synchronously with fail-over of control from one service processor to the other. For purpose of this disclosure “fail-over” refers to a transfer of control from one service processor to another.
There are today a number of methods used in the industry to provide fail-over of redundant subsystems and to coordinate the actions of redundant controllers. Network adapter cards, for example, are sometimes installed in server systems as a redundant pair and a device driver is given the responsibility of transferring traffic from one to the other in the event of a failure of the active card. Those skilled in the art may also be familiar with “voting” systems in which an odd number of redundant controllers make a decision independently but simultaneously and the action taken depends upon what the majority of the systems agree is the proper course of action. Such systems, unfortunately, are typically complex, as evidenced by the delay in the first space shuttle launch caused by a failure of multiple on-board computers to synchronize their communications with each other in such a redundant-controller environment.
It would be desirable to implement a method and system that enables the use of redundant service processors and implements all related control issues. It would be further desirable if the implemented solution did not substantially increase the cost or complexity of the system.