The present invention relates to a method and apparatus for encoding data to be recorded in a data storage device (e.g. a disk drive) according to a run length limited (RLL) code. More particularly, the present invention relates to a method and apparatus for encoding data using a 0,k,m/m recording code. The present invention is particularly well suited for disk drives and other digital data storage devices, but is not necessarily limited to such devices (e.g. it might be used for digital data transmission).
Background for the invention will be provided in connection with a disk drive system. It should be noted, however, that the present invention is not intended to be limited to such systems.
FIG. 1 illustrates a conventional disk drive system 100. The disk drive system 100 is operative for performing data storage and retrieval functions for an external host computer 102. The disk drive system 100 includes: a disk 104, a transducer 106, an actuator assembly 108, a voice coil motor (VCM) 110, a read/write channel 112, an encoder/decoder (ENDEC) 114, an error correction coding (ECC) unit 116, a data buffer memory 118, an interface unit 120, a servo unit 122, and a disk controller/microprocessor 124.
In general, disk 104 includes a pair of disk surfaces (not shown) which are coated with a magnetic material that is capable of changing its magnetic orientation in response to an applied magnetic field. Data is stored digitally in the form of magnetic polarity transitions (frequently referred to as pulses) within concentric tracks on one or more of the disk surfaces. The disk 104 is rotated at a substantially constant spin rate by a spin motor (not shown) that is speed-controlled by a closed loop feedback system. Instead of the single disk 104 shown in FIG. 1, the system 100 can include a plurality of disks all mounted on a single spindle and each serviced by one or more separate transducers.
The transducer 106 is a device that transfers information from/to the disk 104 during read and write operations. The transducer 106 is positioned over the disk 104, typically, by a rotary actuator assembly 108 that pivots about an axis under the power of the VCM 110. During a write operation, a polarity-switchable write current is delivered to the transducer 106 from the read/write channel 112 to induce magnetic polarity transitions onto a desired track of the disk 104. During a read operation, the transducer 106 senses magnetic polarity transitions on a desired track of the disk 104 to create an analog read signal that is indicative of the data stored thereon. Commonly, the transducer 106 is a dual element head having a magnetoresistive read element and an inductive write element.
The VCM 110 receives movement commands from the servo unit 122 for properly positioning the transducer 106 above a desired track of the disk 104 during read and write operations. The servo unit 122 is part of a feedback loop that uses servo information from the surface of the disk 104 to control the movement of the transducer 106 and the actuator assembly 108 in response to commands from the controller/microprocessor 124.
During a read operation, the channel 112 receives the analog read signal from the transducer 106 and processes the signal to create a digital read signal representative of the data stored on the disk 104. Typically, detection circuitry is included in the channel 112. The channel 112 may also include means for deriving timing information, such as a read clock, from the analog signal.
The ENDEC 114 is operative for: (1) encoding data being transferred from the host 102 to the disk 104, and (2) decoding data being transferred from the disk 104 to the host 102. Data being written to the disk 104 is encoded for a number of reasons, including those relating to timing and detection concerns. The ENDEC generally imparts a run length limited (RLL) code on the data being written to the disk 104 to ensure that the frequency of transitions in the bit stream does not exceed or fall below predetermined limits. Such coding ensures that, among other things, enough transitions exist in the read data to maintain an accurate read clock. Other coding schemes may also be employed in the ENDEC 114.
The ECC unit 116 is operative for adding redundant information to the data from the host 102 before that data is encoded in the ENDEC 114 and written to the disk 104. This redundant information is used during subsequent read operations to permit discovery of error locations and values within the decoded read data. Errors in the read data detected by the ECC unit 116 can result from any number of mechanisms, such as: (1) media noise due to media anomalies, (2) random noise from the transducer, cabling and electronics, (3) poor transducer placement reducing signal amplitude and/or increasing adjacent track noise during the read operation, (4) poorly written data due to media defects or poor transducer placement, and/or (5) foreign matter on the media or media damage. ECC units are generally capable of correcting up to a predetermined number of errors in a data block. If more that the predetermined number of errors exist, then the code will not be able to correct the errors but may still be able to identify that errors exist within the block. ECC functionality is generally implemented in a combination of hardware and software.
The data buffer memory 118 is used to temporarily store data for several purposes:
(1) to permit data rates that are different between the disk drive and the host interface bus,
(2) to allow time for the ECC system to correct data errors before data is sent to the host 102,
(3) temporary parameter storage for the controller/microprocessor 124, and (4) for data caching.
The interface 120 is used to establish and maintain communication between the host 102 and the disk drive system 100. In this regard, all transfer of information into and out of the disk drive 100 takes place through the interface 120.
The disk controller/microprocessor 124 is operative for controlling the operation and timing of the other elements of the system 100. In addition, the controller/microprocessor 124 may perform the functions of some of the elements of the system. For example, the controller/microprocessor 124 may perform the correction computation function of the ECC unit 116 if errors exceed the capability of the hardware based unit.
With this background, certain drawbacks associated with conventional disk drive encoding and decoding schemes may now be considered.
As alluded to above, clock information is typically embedded into data stored onto the disk 104. In order to ensure that an adequate and timely supply of clock information is provided for the clock extraction process (which is performed by the channel 112), and perhaps for other reasons, run length limited (RLL) codes are employed. As is understood by those skilled in the art (and therefore will not be described herein), detected data includes clock phase error information that is used in the clock extraction process.
RLL codes are traditionally described as d,k codes, where d is the minimum run length and k is the maximum run length between magnetic transitions. Note that two data representation conventions are frequently used, NRZ (non-return to zero) and NRZI (non-return to zero, change on ones). If NRZ, a magnetic polarity transition occurs when a sequence (one or more) of 0""s changes to a sequence of 1""s, or vice versa. If NRZI, a magnetic polarity transition occurs each time a 1 appears and 0""s appear otherwise. While either convention is acceptable and supportable by this invention, NRZ will be used herein to describe the encode and decode processes. Using either convention, d represents the minimum number of bits that must exist between magnetic polarity transitions, while k represents the maximum number bits that may exist between magnetic polarity transitions. The constraint d is used to control pulse crowding effects, while k is used to ensure the aforementioned self-clocking capability and to facilitate error event length control in certain sequence detector systems. Present-day detectors (e.g., Viterbi detectors and the like) usually permit the minimum run length constraint, d, to be 0.
In order to ensure that the maximum run length limitation, k, is appropriately met (and, if necessary, to ensure that the minimum run length, d, is met as well), present-day ENDECs 114 implement a run length limited code by performing a logically complete, immutable and unambiguous mapping between uncoded words (i.e., words of user data, or decoded words) and encoded words (words that are to be stored on the disk surface), wherein the encoded words meet the run length constraints. In order for the run length constraints to be met, the encoded words must necessarily include more bits than the uncoded words, since words that do not satisfy the run length constraints must be discarded from the set of all possible words having the encoded word length.
The number of bits in the uncoded words may be represented by the integer m, while the number of bits in encoded words may be represented by the integer n, with m less than n. The code rate of an encoder is then defined by m/n and is, therefore, less than one in conventional systems. Encoders exhibiting code rates of 8/9, 16/17, 24/25 and perhaps higher rates are typical for present-day disk systems.
With respect to an encoder having a code rate of {fraction (8/9)}, for example, one of 28=256 possible uncoded words may be mapped to one of 29=512 possible encoded words. However, of the set of 512 possible encoded words, at least those words that fail to meet the run length constraints are discarded and not used (other excess words are also discarded). Accordingly, only 256 of the 512 possible encoded words are used in the encoding process.
Because the encoding process requires uncoded words having m bits to be mapped to encoded words having n bits (where, in the described examples, n=m+1), overhead is added to the disk drive system 100. Specifically, for a disk drive system 100 with an {fraction (8/9)} code rate, {fraction (1/9)}th of the user data space on the disk 104 is occupied by unproductive overhead. Similarly, for a disk drive system 100 with a {fraction (24/25)} code rate, {fraction (1/25)}th of the user data space on the disk 104 is occupied by unproductive overhead. Thus, in an effort to minimize the amount of RLL code overhead, there has been a movement towards designing encoders having higher code rates which implies larger integer m and integer n values, so that the code rate asymptotically approaches, but never quite achieves, the value 1 (i.e., zero code overhead).
However, this never-ending quest for higher code rates by increasing m and n values induces a penalty, i.e., increased decoder error propagation that degrades ECC performance. More particularly, when errors occur in detecting encoded words, there is an average increase in errors associated with mapping encoded words into uncoded words. This average increase in errors occurs because any one encoded error bit may map into one or more decoded error bits, thus potentially into multiple symbols (recognized by the ECC unit 116). Thus, there is a strong correlation between the amount of error propagation and the size of m and n. That is, the larger m and n are, the greater the degree of average error propagation seen by the ECC unit 116.
To compensate for the increased error propagation, more error correction (ECC) symbols may be used. However, using additional ECC symbols translates to additional ECC overhead, thereby degrading (lowering) the ECC system code rate defined as data symbols/(data symbols+ECC symbols). Decreasing ECC system code rate negatively affects the combined code rate defined as the product of RLL code ratexc3x97ECC code rate. Thus, the advantages of continuing to increase m and n values reaches a maximum at some point, because the combined code rate will start to decrease.
Another potential issue may arise if m is not an exact multiple of ECC unit symbol sizes, or if ECC symbols and m bit words do not share the same boundaries whenever possible. In this case, additional error propagation is incurred because certain m bit word errors may affect more symbols than necessary due to poor mapping. For most ENDECs in use today, m is chosen to be a multiple of 8 which is not necessarily an ideal choice for the ECC unit symbol size.
Still another lesser problem with using large m and n relates to the inflexibility of sector sizes. Format efficiency is greatest when the sum of data bytes, CRC bytes, and ECC bytes is an exact multiple of m. As m becomes larger, it becomes more constraining and difficult to achieve this goal. If not achieved, format efficiency suffers which has the same effect as negatively affecting the overall system code rate.
Certain developments in read channel technology utilize concatenated recording codes that provide both run length constraints and information redundancy. Information redundancy (e.g., parity codes, turbo codes, etc.) is used to permit signal extraction at reasonable error rates when signals are present with very poor signal-to-noise. It is important that the RLL portion of the concatenated code imposes no harmful constraints on the choice of the redundancy portion of the concatenated code.
Accordingly, there is a strong need to develop a method and apparatus for encoding data to be recorded in a data storage device (e.g. a disk drive) such that the following set of properties are simultaneously achieved: (1) run length limits (RLL) are constrained as needed, (2) code rate of the RLL code equals 1, (3) there is little or no additional error propagation or error rate increase due to the encoding/decoding process, (4) additional errors per sector induced by the encode/decode process never exceed a preset limit, (5) there is no increased error propagation due to selection of ECC unit symbol size that may differ from the uncoded (decoded) word size m, (6) there are minimal constraints that affect sector format efficiency, and (7) no abnormal constraints are imposed when the recording code is concatenated with other codes that may be used to enhance read channel performance, such as parity codes, turbo codes, etc.
The present invention is designed to minimize the aforementioned problems and meet the aforementioned, and other, needs.
It is an object of the present invention to provide a method and apparatus for encoding data to be recorded in a data storage device (e.g. a disk drive) according to a run length limited (RLL) code that exhibits the following properties: (1) while the d constraint must be 0 (the most common choice today), the k constraint may be arbitrarily chosen to meet the desired maximum run length, although performance is best if maximum run lengths are greater than or equal to 12 for today""s standard sector size of 512 bytes, (2) code rate of the RLL code equals 1, (3) there is no added error propagation due to the decoding process (error propagation is only due to the read channel), but there is a small increase in average error rate due to the encoding process (for modest to large k, this increase is significantly less than that incurred due to error propagation of typical RLL codes in use today), (4) additional errors induced by the encode/decode process are limited to a preset limit, (5) the uncoded (decoded) word size m may be arbitrarily selected, for example to match the ECC symbol size, which means that no additional error propagation is incurred because of mismatched m and ECC symbol size, (6) there is zero to minimal sector format inefficiency since m can be chosen to be relatively small, and (7) no abnormal constraints are imposed when the recording code is concatenated with other codes that may be used to enhance read channel performance, such as parity codes, turbo codes, etc.
Unlike conventional systems, the present invention does not use a mapping function that forces n greater than m (hence, eliminating most of the difficulties of present day RLL codes). Instead, running strings of data bits to be recorded are analyzed logically to determine the number of consecutive 0""s and 1""s. (Note that this statement and all of the following descriptions assume the NRZ convention. If desired, it is easily convertible into the NRZI convention by those skilled in the art.) If any tested string of bits contains a substring of either 0""s or 1""s that exceeds the maximum run length k, the (k+1)th bit is inverted (complemented) to force the maximum run length back to k. The test then continues starting from the inverted bit. This bit inversion forces a single bit error since the same bit string, when read back from the disk, is assumed to be uncoded. This single bit error will cause exactly one ECC symbol error, never more as is quite typical for conventional RLL codes. If data is effectively randomized and a reasonable k is selected, average frequency of intentionally-introduced symbol errors is low relative to symbol errors from other causes. Using random data, the model for average number of single bit, intentionally-introduced errors per sector xcex=(tbxe2x88x92k)/2k+1, where tb means total encoded bits in the sector and k is as previously defined. The ECC unit will be required to correct all such errors as well as errors from other causes. Given the additional decoder and remapping error propagation incurred using present-day RLL codes, a significant net gain in ECC unit code rate can be realized since the symbol errors introduced using the present invention are statistically much lower than symbol errors introduced using present-day RLL codes.
In order to minimize the number of intentionally-introduced errors by the encoder, a randomizer is used to randomize the data before encoding. While any effective randomizer may be used, it is usually possible to select one that can improve run lengths relative to those exhibited by truly random data.
After randomization, there may still be a need to ensure that no more than a certain number of intentionally-introduced errors exist within any one sector. If so, the following method may be used to limit the number. A default seed is chosen for the randomizer that is always used to initially write a sector on to the disk. A counter is used to count the number of intentionally-introduced errors. If the number of intentionally-introduced errors exceeds the predetermined limit (a rarity with random data, a reasonable k, and a reasonable error limit), the default seed value is replaced by an alternate seed value and the data to be recorded is rerandomized and rewritten on to the disk. This process may be repeated, if needed, until the number of intentionally-introduced errors is less than or equal to the predetermined maximum.
When an alternate seed is used in the randomization process, one alternative is to store the sector number and randomizer seed in fast, volatile memory and in a non-volatile memory (e.g., on the disk in a specially reserved sector) so that this alternate seed may subsequently be used when derandomizing data read from such sector. Another alternative is to discover the alternate seed by trial and error, e.g., first try the default seed, which almost always works, then if needed, try alternate seeds in sequential order until one is found that works.
A generalization of the above concept of limiting run-length could be made by defining a set of specific undesirable patterns which are problematic to the detector system (timing loop, detector, equalization, etc). The encoder would be designed to detect the presence of any of these specific patterns, invert the last bit of the pattern when detected, thus eliminating the pattern from the data sequence, but introducing a single bit error. For example, in addition to the two patterns for limiting run length (all 0""s and all 1""s), perhaps a long string of alternating 1""s and 0""s may also be problematic. In this case two patterns would be added to the list: the alternating pattern starting with a 1, and the alternating pattern starting with a 0. As long as the list of specific patterns is sufficiently short, and the specific patterns themselves are sufficiently long (e.g.,  greater than 12 bits), the probability of introducing errors in a randomized data stream should remain acceptably low.
Other objects, features and advantages of the invention will be apparent from the following specification taken in conjunction with the following drawings.