The present invention relates to an error detection circuit in a data processing system, and more particularly relates to a circuit which identifies a failed module in a data processing system wherein the failed module propagates an error to other modules in the system.
In many data processing systems it is common to divide the data handling circuitry into a multiple of field replaceable units or FRUs, such that if one FRU becomes defective, it can be easily replaced by a field engineer to minimize the down time of the system. In the present application, the FRUs of the data processing system are all controlled by a system clock, and are interconnected such that data outputted from one FRU is inputted into one or more other FRUs. The system clock runs at such high speeds that by the time an error caused by a failed FRU is detected, it has been propagated to other FRUs such that the identification of the failed FRU is difficult to determine.
R. J. Kolvick, Jr., IBM Technical Disclosure Bulletin, Vol. 22, No. 1(June 1979), pages 255-257, "Algorithms for Increased Resolution of Field Replaceable Units" discusses a system in which parity is checked by an Exclusive-OR function resident on each of the FRUs and algorithms for identifying an error condition in either the transmitting FRU or the receiving FRU.
W. P. Spraul, IBM Technical Disclosure Bulletin, Vol. 26, No. 11 (April 1984), pages 6078-6079, "Error Sequence Tagging" discusses a machine having multiple functional units wherein each functional unit has an error sequence counter and an error detection circuit. When an error is detected, each error sequence counter counts sync pulses until they are inhibited by the error detection circuit. The contents of all of the counters are then interrogated to determine which error occurred first to identify the responsible functional unit.
R. H. Barsotti et al, IBM Technical Disclosure Bulletin, Vol. 26, No. 11 (April 1984), pages 6187-6188, "First Error Detection Circuit" discusses a latching circuit which latches a first appearing error indication from one of a plurality of input failure lines. Subsequent changes to error inputs of the latching circuit, after the first appearing error indication, are ignored such that the triggered error may be analyzed for determining component failures.
U.S. Pat. No. 4,679,195 issued July 7, 1987 to Dewey for "Error Tracking Apparatus in a Data Processing System" discloses a data processing system having a plurality of data locations. Each data location has a counter and an error detector. Upon the detection of an error, the counter stops counting, thereby freezing the system cycle count at a value corresponding to the occurrence of the error. The counters at each data location may then be interrogated to determine an error history for the data processing system.