The importance of error correction coding of data in digital computer systems has increased greatly as the density of the data recorded on mass storage media, more particularly magnetic disks, has increased. With higher recording densities, a tiny imperfection in the recording surface of a disk can corrupt a large amount of data. In order to avoid losing that data, error correction codes ("ECC's") are employed to, as the name implies, correct the erroneous data.
Before a string of data symbols is recorded on a disk, it is mathematically encoded to form ECC, or redundancy, symbols. The redundancy symbols are then appended to the data string to form code words--data symbols plus redundancy symbols--and the code words are then stored on the disk. When the stored data is to be accessed from the disk, the code words containing the data symbols are retrieved from the disk and mathematically decoded. During decoding any errors in the data are detected and, if possible, corrected through manipulation of the redundancy symbols [For a detailed description of decoding see Peterson and Weldon, Error Correcting Codes, 2d Edition, MIT Press, 1972].
Stored digital code words can contain multiple errors. One of the most effective types of ECC used for the correction of multiple errors is a Reed-Solomon code [For a detailed description of Reed-Solomon codes, see Peterson and Weldon, Error Correcting Codes]. Error detection and correction techniques for Reed-Solomon ECC's are well known. Id. One such technique begins with again encoding the code word data to generate a new set of redundancy symbols and then comparing this set of redundancy symbols with the redundancy symbols in the retrieved code word, i.e. the set of redundancy symbols generated by the pre-storage encoding of the data, to detect any errors in the retrieved code word. [For a detailed discussion of this error detection technique, see U.S. Pat. No. 4,413,339 issued to Riggle and Weng].
Errors in stored data often occur in patterns such as multi-symbol bursts. Such errors may be caused by, for example, an imperfection in the recording medium. Various encoding and decoding schemes have been developed to correct common error patterns. One scheme developed for bursts is interleaving. Interleaving involves separating the data into a number of sections and, using an ECC, separately encoding each section to form a code word. Each code word thus contains a section of the data and a related set of redundancy symbols. The interleaved data and the redundancy symbols from all the sections are then recorded. Typically, the interleaved data symbols are recorded contiguously, followed by the redundancy symbols. Burst errors, which affect a number of contiguous (interleaved) data symbols, cause a small number of errors in each of several code words.
When the data and redundancy symbols are later retrieved, the various sets of redundancy symbols are used to correct errors in the data sections associated with them. Thus a given set of redundancy symbols protects a portion of the data, and presumably, corrects only a portion of any error burst. Accordingly, an ECC which is designed to correct "x" erroneous symbols can correct bursts which are longer than x symbols by correcting them section-by-section. The ECC selected for such a scheme can be less powerful than one which must correct burst errors without partioning them. The advantages to using a less powerful ECC are simpler encoding/decoding hardware and (typically) faster correction. [For a detailed description of interleaving encoding, see Peterson and Weldon, Error Correcting Codes].
Another scheme used to correct common error patterns is multi-level encoding. Multi-level encoding involves encoding data once using an ECC and/or a technique which is designed to correct the most common error patterns and then encoding the data a next time using another ECC and/or technique which is designed to correct the next most common error patterns, and so on. Multi-level decoding involves correcting the data using the first level ECC or technique and, if errors then remain, correcting them using the second level ECC or technique, and so on.
For example, a first level of encoding may consist of encoding the data with a relatively weak ECC. A second level of encoding may consist of again encoding the data with a more powerful ECC. Such a two-level scheme is disclosed in U.S. Pat. Nos. 4,706,250 and 4,525,838 to Patel. The Patel scheme disclosed in the two patents first encodes the data using an ECC which corrects e.sub.a errors. Patel separates a block of data into "i" sections, or sub-blocks, of "k" symbols each and separately encodes each k-symbol section using a first ECC. This level of encoding generates "p" redundancy symbols for each data section, or a total of "ip" redundancy symbols.
Patel next adds together (exclusive OR's) the corresponding data symbols in each sub-block and treats the resulting k symbols as an additional "data" section. Patel then encodes these symbols using a more powerful ECC which can correct up to e.sub.b errors (e.sub.b .gtoreq.e.sub.a). This second encoding generates "s" additional redundancy symbols. Patel stores the data and the ip and s redundancy symbols, that is, the redundancy symbols generated during the two levels of encoding. He does not store the k additional "data" symbols.
The Patel scheme can correct up to e.sub.a errors in each sub-block using the first level, or level-1, ECC and up to e.sub.b errors in any one sub-block using the level-2 ECC. Accordingly Patel attempts first to correct any errors in the data using the level-1 ECC. If all the errors are corrected using this ECC, he stops the error correction decoding operation and does not use the level-2 ECC. If all but one sub-block is corrected, the level-2 code is used in an attempt to correct it. Patel thus exclusive OR's the corresponding data symbols in each section which include the symbols corrected using the level-1 ECC and the erroneous symbols which the level-1 code did not correct, to form k "data" symbols which correspond to the k-symbol additional data section generated during encoding. Patel then attempts to correct the errors in these k-symbols using the level-2 code.
If there are e.sub.b or fewer errors, the level-2 code determines the locations of the erroneous symbols within the k-symbol "data" section and generates the associated error values, that is, the symbols which must be exclusive OR'd with the erroneous symbols to correct them. Patel translates the error locations to the sub-block which contains the errors and corrects them using the generated error value symbols.
If the data contains more than e.sub.b errors per sub-block, or more than one sub-block with more than e.sub.a errors, the Patel scheme can not correct the errors. Thus, as more errors occur because of increased recording densities, it is desirable to employ an error correcting scheme which is capable of correcting a greater number of errors.
One solution is to use a more powerful ECC at each level. This creates three problems. First, the more powerful ECCs generate additional redundancy symbols. Thus more storage space must be allocated to the data, and less data can be recorded in a given storage space. Second, the more powerful ECCs require more complex, and thus more expensive, encoding and decoding hardware and/or software. Third, the ECCs require more time to correct the errors than the ECCs which use fewer redundancy symbols.
As data transfer technologies improve, and the speed with which data may be retrieved from a disk increases, a slower, more complex ECC acts as a limit on the speed with which data may be transferred. Accordingly, what is needed is a more powerful encoding scheme which can quickly correct errors, is easily implemented and does not require excess amounts of storage space.