Today's computer systems typically employ multiple electronic modules which cooperate to perform system functions and which pass information to one another through means such as a backplane system bus. A typical system may contain one or more CPU modules including the system processors, an I/O module for communicating with external devices, multiple memory modules, and a power subsystem controller module for monitoring and controlling system power. Power-up diagnostics are provided in such systems for testing each module and identifying any failures.
Power-up diagnostics typically begin by testing the processor on the CPU module, and then testing its ability to access the system bus. However, the diagnostic test instructions to be executed by the processor for testing the processor and CPU module typically reside in a non-volatile memory such as a ROM which is located somewhere on the system bus. Thus, the ability to test the processor and bus access capability depends upon the operability of the system bus, which is subject to an increased risk of failure due to its multitude of bus lines and its interconnection throughout the system. There is a need for providing processor module test instructions to the processor by way of some means other than the system bus, so that testing of the processor and CPU module may proceed independently of the condition of the system bus.
There must then be a way for the processor to determine which modules are installed, and to store any fault information for each module. Serial control busses separate from the system bus have been implemented in multi-module computer systems for determining the presence modules and storing fault information. A typical such serial control bus consists of an RS-232 or Inter-Integrated Circuit (I.sup.2 C) two line serial bus connected to each module. Serial non-volatile memories are connected to the serial control bus on each module, so that in the event of a module failure, the fault tags indicating the cause of the failure may be stored for retrieval at the service depot by means of the serial bus.
It is preferred that the serial control bus be highly reliable, as its functionality is required on power-up in order to determine module configuration in case the system bus is inoperable, and because it is the means for storing fault information in case there is a failure. However, the serial control bus is typically connected to the modules through the backplane bus connectors. As computer systems become increasingly more complex, the complexity of these backplane bus connectors increases accordingly. Today's backplane busses may be hundreds of bits in width, therefore they tend to require backplane connectors with very small, thin, tightly packed pins. These pins are subject to an increased risk of mechanical connection failures due to breakage and/or shorting to adjacent pins. A more reliable means of connecting the serial control bus lines to the modules is therefore desired.
Next, there must be some way of relating to the user during power-up diagnostic execution which modules are being tested and which have failed. In the past, each module in a computer system may have been provided with a LED which remains lit when the module is operating correctly. However, for ergonomic and regulatory reasons it may be preferable to light LEDs only when a module is inoperable; i.e. a lit LED indicates a bad module. In this case, if the LEDs reside on the modules themselves, it may not be possible to light the LED if the module is bad or mis-installed. There is therefore a need to separate the LED test indicators from the modules to be tested.
Finally, during normal operation, there are many conditions which may cause the entire computer system to power down. The computer power subsystem monitors these environmental conditions, and when it determines that the operating environment has become unsafe for continuing normal operation, it shuts down power to the system.
The conditions which may cause power shutdown include a deliberate power down by the operator, or an unexpected power failure. Intervening environmental events may also cause power shutdown. For instance, the ambient temperature may be too high, the cooling fans may fail, or the AC line voltage may be out of tolerance.
If the environment returns to its normal condition before field service personnel are able to check the system, the system is likely to power up normally. It may then be impossible to determine the reason for the system shutdown. Where the shutdown occurred due to an external power failure, the system should be presumed fully functional on power up, but where the shutdown was due to an overtemperature condition or a fan failure, an intermittent system problem may exist. If field service personnel mistakenly attribute a system shutdown to a power failure, there is a chance that a real system problem has been left to re-occur, leading to excessive service calls and customer dissatisfaction.
It is therefore desirable to provide a means for storing information representing environmental conditions present in the power subsystem prior to a power shutdown, which may be recalled after the system is powered back up to aid in determining the cause of the shutdown.