1. Field of the Invention
The present invention relates generally to encoding and decoding for data transfer and, more particularly, to a method and apparatus for use in encoding and/or data by providing for a Hamming weight enhancement to the data.
2. Related Art
Data is frequently stored and/or communicated in binary form, that is, as a sequence of binary digits or bits having zero and one values. When such data is communicated or simply transferred to a storage medium, such as occurs in the operation of a disk drive, for example, there is a substantial risk of transmission errors (e.g., a one-valued bit being received when a zero-valued bit was transmitted or vice-versa) caused by noise in the communication channel and other factors. These transmission errors are especially significant when data compression techniques are used to reduce the number of bits needed to communicate a particular message (e.g., in a disk drive or, more generally, in a communication channel), because a single erroneous bit in a compressed message can result in corruption of a larger amount of information in the message after the message is decompressed. Disk drives, of course, are well-known in the art, but additional information and a general circuit arrangement for one exemplary disk drive apparatus may be found, for example, in Wakamatsu U.S. Pat. No. 6,011,666, issued Jan. 4, 2000, the disclosure of which is hereby incorporated herein by reference as if set forth herein in its entirety.
Many different approaches to overcoming this problem with transmission errors have been developed. FIG. 1 illustrates one example of a prior-art write-channel encoder apparatus for encoding information for communication via a channel (or for storage on any suitable media). As shown, an encoder 18 receives user data 20 and sequentially performs error correction coding or ECC (block 22), run length limit or RLL encoding (block 24), and precoding (block 26) on the user data 20 to thereby produce encoded data which is then written to any suitable media and/or any suitable communication channel (block 28).
An inverse process for decoding the information encoded by the encoder 18 of FIG. 1 is performed by a prior-art read-channel decoding apparatus such as the decoder 29 illustrated in FIG. 2. Initially, the decoder 29 reads or retrieves encoded information from a media source or a communication channel (block 30) and performs a maximum likelihood sequence detection (MLSD) on that encoded information (block 32). As will be appreciated by those of ordinary skill in the art, this MLSD detection compares the retrieved encoded information with a set of all possible information sequences and determines which of the possible sequences most likely represents the encoded information based on which of the possible information sequences is “closest” (e.g., by computing a predetermined error quantification metric for each possible sequence and selecting that sequence which has the smallest or minimum value for the computed error quantification metric).
The read-channel decoder 29 then performs reversed precoding on the detected data sequence (block 34) followed by RLL decoding (block 36) and ECC decoding (block 38) to thereby derive user data 40, which corresponds to the user data 20 encoded by the encoder 18 (FIG. 1).
In this approach, the RLL decoding step in the data recovery process undesirably causes small errors to be propagated into many symbols of error in the decoded message. As will be appreciated by those of ordinary skill in the art, the number of symbol errors that can be fixed by the error correction coding (ECC) is limited by design. Consequently, the RLL decoding used in this approach degrades the final error rate of the data recovery process.
One solution to this problem has been to reduce the size of the actual encoded word to minimize the maximum possible size for error propagation. In the most extreme case, only one symbol is encoded, and the rest of the symbols in the code word are left unencoded.
By way of example, where a communication scheme employs an eight-bit symbol size and an unencoded code word made up of 4 symbols {u1, u2, u3, u4}, the best error correction is obtained by simply encoding one of the symbols into a 9-bit code word. For example, if u1 is encoded to produce a 9-bit code word E1 a final code with low error-propagation can be constructed as follows:final code={u2,e1a,u3,e1b,u4}
where e1a is the first (most significant) 4 bits of the code word E1 and e1b is the last (least significant) 5 bits of the code word E1. In such a scheme, following the decoding process, errors of small size (i.e., over 2 bits more than the smaller of e1a or e1b) can corrupt at most two symbols, which may occur when an error encompasses an unencoded symbol and one of the partially encoded symbols.
The advantage of this kind of simple encoding scheme is in the reduced error propagation after RLL decoding and thus the total number of symbols that can be corrupted by a single error event. As a consequence, this encoding scheme allows for an increase in the total number of error events the error correction coding (ECC) can correct. A significant disadvantage of this scheme is that the smaller encoded symbol makes the minimum Hamming weight of the total code word very small (i.e., equal to the Hamming weight of the single encoded symbol). As will be appreciated by those of ordinary skill in the art, the Hamming weight of a code word or other collection of symbols is defined as the number of non-zero symbols in the code word or collection of symbols in the Interleaved Non-Return to Zero Inverting (INRZI) domain. For binary signaling, the Hamming weight of a bit stream is the number of “1” bits in the bit stream. The read-channel device (i.e., the decoder) performs numerous functions, such as timing and gain recovery, before it passes sampled data to the MLSD detector. The purpose of RLL encoding is not only to remove quasi-catastrophic sequences that can degrade the performance of the MLSD, it is also used to introduce a guaranteed minimum number of signal transitions to allow the gain and timing loops to operate. Codes with high Hamming weight typically provide more signal transitions that can be used for the timing and gain loops. Codes with very low Hamming weight may not provide adequate timing and gain gradients for the channel to provide correct sampling phase and gain control.
This encoding scheme is least effective when every encoded symbol in the entire encoded data field has minimum Hamming weight and all the other unencoded symbols are zeros. The degradation in the timing and gain loops as a result of the channel operating on a sequence of low Hamming weight codes can be so large that it negates the benefit of the reduced error propagation made possible by this encoding scheme.
To mitigate this problem, the storage industry has used data-scrambling technique to minimize the probability of creating a large number of zeros in the user symbols. Unfortunately, this technique merely makes it more difficult for an end-user to locate a bad data sequence. There is still some finite probability that the bad data sequence can be created accidentally by the scrambling of some random data (e.g., random data created by running a string of zero-valued bits through the same randomizer used in the channel). This randomizer technique is therefore less than satisfactory.
FIG. 3 illustrates an alternative embodiment of a prior-art write-channel encoder apparatus 41, which is similar to the encoder apparatus 18 of FIG. 1, except that the order in which the error correction coding (ECC) and the run length limit (RLL) encoding are performed is reversed in the encoding apparatus 41 of FIG. 3. In other words, as shown, run length limit encoding is performed on user data 42 (block 44), and error correction coding is subsequently performed (block 46). The result then undergoes precoding (block 48) and the encoded information is then written to any suitable media and/or a communication channel (block 50).
Similarly, FIG. 4 illustrates a read-channel decoding apparatus 51 which reads encoded information from a media or a communication channel (block 52) performs MLSD detection (block 54), reversed precoding (block 56), error correction code (ECC) decoding (block 58), and run length limit (RLL) decoding (block 60) to generate user data 62, which corresponds to the user data 42 processed by the encoding apparatus 41 (FIG. 3).
Permuting or reversing the ECC and RLL encoders in the “write” process in this fashion serves to prevent or at least reduce error propagation. In such an encoding scheme, the ECC would correct on the data before RLL decoding, thereby completely avoiding the error propagation effect of RLL decoding. The output after ECC correction is assumed to be perfect user data, such that the RLL decoding following the ECC will not involve any error propagation. This method of recording and data recovery allows RLL code to be designed in such a way as to optimize for Hamming weights and thereby improve the performance of the timing and gain loops of the channel. Thus, the error propagation effect of the previously described encoding scheme are obviated.
While this latter encoding scheme does provide at least a theoretical improvement in terms of reduced error propagation compared to the encoding scheme described above, it also has certain drawbacks due to compatibility issues that arise from the way in which storage devices that employ these encoding schemes are traditionally certified. Traditionally, technical customers of mass storage device often test the ECC capability of a given mass storage device before deciding to purchase the device. A common method of performing such a test involves so-called “read-long” and “write-long” processes.
In a read-long process, the drive or other device is instructed to read a certain data sector and return all the user symbols in that sector, plus the included ECC symbols, to the host. The host then runs a program that deliberately corrupts some number of bits in the received data (i.e., the user symbols and ECC symbols) and returns the corrupted data back to the drive using a write-long command. The write-long command instructs the drive to write all the long data back to the disk without additional ECC. In a subsequent normal read, the data is read back to the host and the host checks whether the ECC during a normal read has corrected the error that the host had deliberately introduced. This process is done repeatedly using long data corrupted at different location with different numbers of group of corrupted bits and different numbers of corrupted bits per group. These read-long and write-long instructions provide a means for the technical user to verify that the drive indeed has the specified error correction capability before deciding to purchase the drive.
A problem with the permuted ECC/RLL scheme is that one corrupted symbol can generate a large number of encoded symbols that are corrupted, assuming that the RLL encoder used is the one that is optimized for the timing and gain loops of the channel. Thus, the conventional read-long and write-long test method will produce a result that appears to suggest that the ECC performance of the drive is vastly inferior to what is claimed by the manufacturer, and a technical customer may decline to purchase the drive because of this adverse test result. This, in turn, will cause the manufacturer of the drive to lose market share to other manufacturers that do not incorporate ECC/RLL permutation. For this reason, the permuted ECC/RLL scheme may not be readily accepted in the storage industry.