1. Field of Use
The present invention relates to data processing and more particularly to microprogrammed control elements used to direct the operations of a processing unit.
2. Prior Art
As data processing systems become entrusted with performing increasingly more critical tasks requiring high dependability, this increases the need for such systems to be fault tolerant. An important aspect of a fault tolerant strategy is the detection of faults.
Faults are generally classified in terms of their duration, nature, and extent. The duration of a fault can be transient, intermittent, or permanent. A transient fault, often the result of external disturbances, exists for a finite length of time and is nonrecurring. A system with intermittent faults oscillates between faulty and fault-free operation, which usually results from marginal or unstable (metastable) device operation. The ability to detect faults reliably is essential to the recovery from transient faults.
Advances in computer architectures make it very difficult to implement recovery strategies for various types of faults without adding to design complexity. A key element in any processing unit is the control unit. Therefore, it becomes important to be able to detect when the control unit is not operating properly. Many processing units rely on microprogrammed or firmware control units. It is known to include error detection circuits within such microprogrammed control units for detecting any parity errors in each of the microinstructions read out during a cycle of operation. However, such arrangements are unable to detect the occurrence of transient or intermittent faults, particularly in the logic circuits which operate in conjunction with the microprogrammed control unit.
In at least one prior art system, a predetermined pattern was included in unused locations which caused a branch to a routine for reporting having accessed such a location. However, there was no way or means provided for determining how and where the fault occurred. Thus, the lack of information in this regard made it difficult to diagnose the cause of the fault.
Accordingly, it is a primary object of the present invention to provide a technique for detecting the occurrence of transient or intermittent faults within a microprogrammed control unit.
It is a more specific object of the present invention to provide a method and apparatus for use in conjunction with a microprogrammed control unit for facilitating the recovery of such control unit from transient or intermittent errors.