Fault detection/isolation is an important internal function found in most currently available processing systems and in particular in high performance processing systems such as supercomputers. One established method of fault detection/isolation is parity checking. In a parity checking system, each word of data is associated with a parity bit which is set such that the overall word composed of the data bits and the associated parity bit has a predetermined odd or even parity. When data is transferred, a check is performed at the destination to determine if the parity differs from that at the source. If the party does differ at the destination, then a fault has occurred which has inverted one of bits of the word being transferred.
Although parity checking is a conceptually straight forward approach to fault detection/isolation, actual implementation of a parity checking scheme raises a number of difficulties. For example, is not always possible to perform a parity check on a given word of data as it is transferred through a data path with sufficient frequency to precisely isolate the source of any faults should they occur. This is especially true in high speed processing systems where frequent fault checking may either impede data flow or may not be possible at all at certain data flow points. Some of the fastest currently available components, such as the gate arrays, are still to slow to provide the fast device to device communication required to perform parity checking at high data transfer rates. Without the capability of performing fault detection/isolation at such data flow points such as the gate arrays a given word may travel for a substantial distance along a data path, and consequently be subjected to the possibility of corruption, before fault detection/isolation can be performed. In this case, the fault can only be generally isolated to that part of the data path between a pair of endpoints that do have parity checking.
Thus, the need has arisen for apparatus, systems and methods for isolating faults in high performance processing systems.