Estimating the error characteristics of communications systems, such as magnetic recording systems, is essential to designing error correction codes (ECC) to allow such systems to operate with vanishingly small probabilities of error. The term “communications” is used herein in its broad meaning to refer to any system in which information-bearing signals are transferred or transmitted from a source to a target. As is known, ECC is used to identify and correct errors during a transmission. Codes are specifically designed for different systems to correct up to a fixed number of bytes in error within a codeword. In order to design a sufficiently robust code and avoid loss of data from an unacceptably high rate of uncorrectable errors, it is therefore important to accurately analyze and estimate the distribution of errors (that is, the number of codewords in a data stream having one byte in error, the number of codewords having two bytes in error, etc.).
Modern systems employ complex data structures of interleaved ECC codewords. Interleaving alternates bytes from several codewords into a longer codeword structure spreads each codeword along a greater physical distance. However, because bytes of multiple codewords are now alternating, a single error event may influence multiple codewords. Consequently, error analysis is complicated by these new structures.
ECC models for magnetic hard disk drive (HDD) systems frequently assume that a bit or byte defect is an event which is independent and uncorrelated with any other event. Typically, errors in HDD systems are caused by white noise and are, in fact, random, independent events. Because hard disks are sealed, new defects generally do not occur after the drive is manufactured. Moreover, the same head assembly which is used to write data to the disk is used to later read the data from the disk. Consequently, tracking is generally consistent with few errors arising due to mistracking. Because the errors in an HDD system are independent, one can determine the probability of the occurrence of n errors by obtaining the probability of the occurrence of a single error event and raise it to the nth power.
As noted, the standard error model assumes that each error event is random and uncorrelated with every other error event. Such models begin with either a bit error rate or a byte error rate, then use the three equations below to predict probabilities of various error events.
                              P          ⁡                      (                          n              ,              k              ,              p                        )                          =                              C            ⁡                          (                              n                ,                k                            )                                ·                                    (                              1                -                p                            )                                      n              -              k                                ·                      p            k                                              [                  Eq          .                                          ⁢          1                ]                                          PG          ⁡                      (                          n              ,              k              ,              p                        )                          =                              ∑                          j              =                              k                +                1                                      n                    ⁢                      P            ⁡                          (                              n                ,                j                ,                p                            )                                                          [                  Eq          .                                          ⁢          2                ]                                          C          ⁡                      (                          n              ,              k                        )                          =                              ∏                          i              =              1                        k                    ⁢                                    n              +              1              -              i                        i                                              [                  Eq          .                                          ⁢          3                ]            
Eq. 1 predicts the probability of exactly k error events occurring in a sequence of n bits or bytes, given the probability p that any one bit or byte is in error. Eq. 2 predicts the probability that more than k error events occur in a sequence of n bits or bytes, given the probability p that any one bit or byte is in error. Both Eq. 1 and Eq. 2 use Eq. 3 which is one representation of the number of combinations of n things taken k at a time.
For example, assume that an ECC codeword with no interleaving contains n bytes and that the probability that any one byte is in error is p. Further assume that the ECC has the power to correct up to 5 bytes in error. A typical analysis computes the probability of codewords having 1, 2, 3, 4 and 5 bytes in error, which would be correctable error events for this system. In addition, the probability that more than 5 bytes are in error would give the probability that the ECC fails to correct the codeword. The probability that exactly k bytes are in error is P(n, k, p) (Eq. 1) and the probability that more than k bytes are in error is PG(n,k,p) (Eq. 2).
FIG. 1 shows the probability that the codeword has exactly n bytes in error based on the above equations, for n=1 to 5. Importantly, the plot illustrates that the calculated predicted results differ greatly from the measured performance of an actual interleaved system, thus demonstrating that the simple model is inadequate, particularly as the desired number of correctable bytes increases.
Variations on this model produce, for small values of p, the relation between the probability of 1 byte in error to the probability of other bytes in error:P1=P(n,1, p)  [Eq. 4]Pn=P1n  [Eq. 5]
Unfortunately, such variations also poorly approximates a real world system. For example, a system with an interleave factor of two (a codeword pair structure whose bytes alternate between the bytes of two individual codewords ) may show a probability of a one byte error that is predictable from the base probability of error. However, the probabilities for subsequent bytes in error follow a different slope determined by another probability Px which may be substantially larger than P1.P1=P(n,1, p)  [Eq. 6]Pn=P1·Pxn−1 for 1<n  [Eq. 7]
It will be observed that regardless of the manipulation of the standard, random error rate model, the ratio of the probability of one byte in error to two bytes, two bytes to three, etc. must be the same. Observations of actual systems demonstrate that this is not the case and the standard model cannot be made to fit observations.
Thus, the standard model is too simple to model the real system. It uses a single variable to describe a defect: the rate at which defects occur. However, it is understood that defects have size and that not all defects are of the same size. Further, defects are not, in fact, independent events.
In the past, when ECC in a magnetic tape system was designed to identify and correct one or two bytes of a codeword in error, models based on HDD system assumptions may have been adequate. However, these assumptions make closed form analysis convenient, containable and relatively accurate for HDD systems, models based on such assumptions are overly simplistic for modern magnetic tape systems. Because tape media is not in a sealed environment as a hard disk is, defects may occur after manufacture and may, in fact grow over time. Moreover, tapes are frequently loaded into different tape drives whose tracking may differ slightly from one to another, thereby increasing the likelihood of errors.
With increased data densities, more complex interleaved codewords and the desire for more robust correction, the old models provide a poor fit to actual, real world systems and a need exists for more accurate modeling.