Large scale computer systems typically have one or more high performance main processors coupled to a plurality of service processors (SVPs). Basically, the service processors control and monitor the operation of the main processors. However, some service processors are used to provide additional services, such as machine logging, statics gathering for performance evaluation, and system recovery in the event of a failure.
The growing responsibilities assigned to the service processors have caused a concern over both their availability and their reliability. In U.S. Pat. No. 4,455,601, Griscom et al describe a multiprocessor system wherein the service processors are cross-coupled to two main processors. Therein, only one service processor is assigned to actively perform service functions at any one time. The other service processor is dormant and acts as backup to the active processor.
In the Griscom et al system, various aspects of the active processor are continually monitored by its associated maintenance service and support adapter (MSSA). Upon detection of a predetermined set of error conditions, such as stall initial microprogram load (stall IML), the active MSSA will signal, through the other MSSA, the other service processor to start up and take over.
Since the MSSA is not a receiver of service functions provided by a service processor, detection of failures by a MSSA according to Griscom et al is therefore not a true reflection of the availability of services received by the main processors. With the increasing role assigned to the service processors, a more reliable detection scheme is needed.