Research shows that many computer failures are preceded by bursts of errors induced by intermittent faults, as described, for example in “Error Log Analysis: Statistical Modeling and Heuristic Trend Analysis” by T. Y. Lin and D. P. Siewiorek, IEEE Transactions on Reliability, Vol. 39, No. 4, 1990, pp. 419-432. Early detection of failure prone interconnects significantly improves availability of systems employing those interconnects (for example, in computing systems). Isolation of a failing interconnect before a crash occurs can allow for seamless activation of spare interconnects or for graceful degradation in cases where spare interconnects are not available.
Some failure prediction mechanisms rely on the detection and counting of errors which occur while control and/or data packets are transferred over the interconnect. A problem with this type of approach is that unrecoverable errors detected within control and/or data packets commonly lead to system failure.
Some failure prediction mechanisms test interconnects off-line prior to inclusion of the interconnect in a system. A test pattern is sent and detection and counting of errors is performed prior to use of the interconnect in the system. Problems that occur with this type of failure prediction mechanism include situations where the interconnect degrades over time or quickly degrades while the interconnect is working within the system. Since the testing is done offline prior to inclusion of the interconnect in the system the degradation would not be identified during ongoing use of the interconnect in the system.