The present invention relates to computer systems, and to the diagnosis of computer system errors.
Computer systems are becoming more complex and as a direct result more difficult to diagnose when they go wrong. Soft and transient errors further make diagnosis more difficult because the “fault” can disappear when an attempt is made to diagnose an error.
Mechanisms have been proposed for the diagnosis of errors in system-level interconnects and also in large DRAM and SRAM arrays. However, this does not address the protection of other interconnects within a system. Although such interconnects are typically fairly robust, faults can still occur.
For example, boot information for a processor is typically held in a programmable read-only memory (PROM) that is not parity protected. In normal usage, the PROM and bus system through which it is accessed are very reliable, so that this has not conventionally been seen as a problem. However, very rarely, the information held in the PROM 30 could become corrupted, for example as a result of a cosmic ray event. In this case, it is possible that incorrect information can be provided to the processor via the bus system. As the PROM contains boot information, and is used by the processor at an initial boot time, then this could cause a problem in that the processor might not start correctly or hang during a restart.