As is known, systems of all types are prone to errors due to any of various conditions. Some conditions may be transient (temporary and possibly self-correcting). Other conditions may be systematic (neither temporary nor self-correcting). Non-systematic conditions may cause problems for the system, such problems are generally temporary and may not require corrective action. By contrast, systematic conditions may also cause problems for the system but due to their non-temporary nature, will generally require some corrective action to be taken, either by a human system operator or by an error correction procedure within the system itself. It is important to be able to distinguish between systematic and non-systematic errors to ensure that proper corrective action is taken only when necessary, thereby reducing inefficiency due to unnecessary (and possibly time consuming and/or costly) actions.
An example of a system is a data recording device, such as a magnetic tape drive (an optical disc recorder is an example of another system to which the present invention is applicable). Customer data is written to magnetic tape media in logical units, herein referred to as “data sets”. Each data set comprises a fixed number of sub-units, herein referred to as “data segments”. During a write operation, the data segments of a data set are recorded onto the tape media. Following this recording, the data segments are read back by the tape drive to identify any segments which contain errors. If an erroneously written segment is identified, it is re-written, typically to a different location on the tape media. When all erroneous segments have been re-written, they are read back to identify further errors. The read-back/re-write process continues until the tape medium contains at least one error free image of each segment in the data set. As will be understood, the total number of data segments actually recorded onto the tape medium will be larger than the number of data segments in the data set if any erroneously written segments have been identified.
If each re-written segment is recorded to a different location on the tape medium from the corresponding originally recorded segment, the total amount of useful data which can be recorded onto the tape medium is reduced. If the number of segments which are re-written is large, the loss of capacity on the tape medium becomes significant. Moreover, the write data rate suffers because of the extra time required to re-write the erroneously written segments and the read data rate similarly suffers because of the extra time required to read all written and re-written segments in order to obtain a complete image of the data set.
There are numerous possible conditions which may cause a data segment to be erroneous and have to be re-written. Such conditions include:                1) random electronic noise;        2) a media defect, such as poor magnetic coating, substrate irregularities or physical damage (creases or distortion); and        3) other causes, such as mismatch among the settings of the read and write channel electronics, the write head and the tape media. Such a mismatch may originate in the manufacturer's system set up, differences in components, normal aging and/or wear of components.        
The first condition is always transient; the second cause is typically transient; and, the third is systematic and not transient.
When a data segment is erroneous due to a transient condition, little if anything may be done to correct the problem because, by definition, it has passed. However, a systematic condition may be correctable if it can be detected and identified as systematic. For example, adjustments may be made to the electronics to offset a mismatch among components or to compensate for aging or wear. It will be appreciated that performing such tuning when the condition is only transient will result in a de-tuned, suboptimal system.
Existing methods fail to adequately and consistently distinguish between systematic and non-systematic conditions. One such method averages input values without distinguishing between many small events, which are indicative of systematic conditions, and a few large events, which are indicative of localized or large random (transient) conditions. Averaging is unable to consistently analyze statistics resulting from the input values. A second method relies on a decreasing moving average (using an IIR filter). However, this method is also unable to consistently distinguish between long term (systematic) conditions and sporadic, localized (transient) events.
Consequently, a need remains to be able to consistently identify systematic errors in a system, thereby allowing remedial action to be taken only when appropriate.