The present invention relates generally to the encoding of Cyclic Codes. More specifically, the present invention is related to encoding Cyclic Codes in data storage systems where the storage medium, such as the flash memory, has a default value such as one for all bits after the medium is erased but not yet written with data.
The storage media in the flash memory is segmented into units such as blocks and pages. Usually a block contains multiple pages. The block is a unit for data erasure while the page is a unit for data read/write. A data erasure causes a block to be “erased”, which means forcing all bits in the block to a default value such as one. For the host to overwrite an already written page with new data, the block containing the page would be erased before the page is written with new data. Hereinafter, an “unwritten” area refers to an area which has been erased but has not been written with data. If the host writes only portions of an erased area, then all bits in the unwritten portions of the area would retain the default value.
Error correction coding (ECC) is often used in data storage systems to protect the integrity of the data stored in the storage medium against data-corrupting conditions such as storage medium defects, random read errors, etc. Cyclic Codes, which are linear block codes, are a class of error correction codes which is often favored over other classes of codes due to their algebraic properties which lead to very practical implementations of the encoding and decoding algorithms.
FIG. 1 illustrates a generalized error correction coding scheme in a data storage system (101). On write, the data generated by the data source (102) are sent to the Encoder (104) to be encoded. The encoded data are then sent to the storage medium interface (106) which manipulates the encoded data into a format suitable for being stored in the storage medium (107). The storage medium interface (106) varies tremendously among different systems according to the specific characteristics of the storage medium. Conversely on read, the storage medium interface (106) reads the stored data from the storage medium (107) potentially with errors and recovers the encoded data format, which is sent to the Decoder (105) where the original data from the data source (102) are recovered and sent to the data destination (103). The present invention focuses on the Encoder (104) of the host system (101).
With an (n, k) linear block code, encoding maps a sequence of k message symbols into another sequence of n symbols, where n>k. The resultant sequence of n symbols is commonly referred to as a “codeword” which is sent to the storage medium interface to be stored in the storage medium. Encoding methods are generally divided into two categories, namely the systematic and the non-systematic encoding. With systematic encoding, the message appears in the codeword itself, occupying the first k symbols of the codeword. Therefore, systematic encoding basically calculates n−k parity symbols based on the k message symbols and attaches the parity symbols to the message symbols to form the codeword. On the other hand, with non-systematic encoding, the message does not necessarily appear in its corresponding codeword. The present invention focuses on systematic encoding. The message in a systematic codeword is also referred to as the “message section”.
When ECC is applied to the flash memory, the host system typically segments data to be written to the flash memory into “messages” and employs an ECC encoder to encode each message into a codeword which is physically stored in the flash memory. In the present invention, the area in flash memory reserved for storing each codeword is referred to as the “flash codeword area”. If systematic encoding is used, then each flash codeword area contains two sub-areas, namely the “flash message area” and the “flash parity area”, which are used to store the “message” and the “parity” of the codeword, respectively.
“Error Control Coding: Fundamentals and Applications” by Lin & Costello, Copyright 1983 by Prentice-Hall, Inc. with ISBN 0-13-283796-X, (hereinafter “Lin & Costello”) is incorporated by reference herein. Lin & Costello discloses in Section 4.3, pp. 95˜pp. 98, a systematic encoder for an (n, k) cyclic code based on the generator polynomial of the code, which is hereinafter referred to as the prior-art “base encoder”.
As illustrated in FIG. 2, the prior-art base encoder is a feedback shift register (FSR). Before encoding, the FSR is cleared to zero. To encode a message, the message symbols are input (205) at a rate of one symbol per cycle while “output-enable” (206) has the value of zero which causes the multiplexer (204) to pass each message symbol into the adder (202) as well as to the output (203) as a codeword symbol. After all k message symbols have entered the encoder, the n−k parity symbols are obtained and stored in the registers (201). The value of output-enable is then changed to one, which causes the parity symbols (207) to be shifted out (203) at a rate of one symbol per cycle. As a consequence, the systematically encoded codeword appears at output signal (203) at a rate of one symbol per cycle.
Theoretically, the FSR obtains the n−k parity symbols after all k message symbols have been input. In the case where the first j of the k message symbols are zeros, the FSR remains zeros during the input of the first j message symbols, and only begins to have nonzero values after the first nonzero message symbol is input. In other words, inputting the first j zero symbols does not change the FSR from its reset state. Therefore, to encode a k-symbol message whose first j symbols are zeros, the host only needs to input the last (k−j) message symbols to the encoder. This property of the prior-art base encoder is helpful because, for various application-specific reasons, such as the size of the flash memory page, the host may choose to encode a message of less than k symbols, e.g. (k−j) symbols, into a codeword. Mathematically, the n-symbol codeword obtained this way has the first j symbols being zeros followed by (k−j) message symbols followed by (n−k) parity symbols. If the decoder's syndrome computer has a similar property, i.e., reading zero symbols does not change the syndrome computer from its reset state, then the first j zero symbols of the codeword do not need to be physically stored in the flash memory, and thus savings are achieved in both storage space and data transfer times.
Although the prior-art base encoder allows the host to input a shorter message of (k−j) symbols, the host is still required to input the (k−j) message symbols to the encoder continuously without any gap. Further, the input message symbols are immediately followed by the parity in the codeword without any gap. In certain applications, the host may need to write to one or a plurality of discontinuous fragments in the flash message area, leaving the gaps between these fragments unwritten, and/or leaving the gap between message and parity unwritten. In these applications, there are two scenarios where the prior-art base encoder may be used. In the first scenario, the host inputs only the message symbols to the encoder even though the message is written to physically discontinuous fragments in the flash message area. In other words, the parity of a codeword is calculated based on only the written fragments, and not gaps between them. On read back, since the gaps between the written fragments are not a part of the codeword, the host must have prior knowledge of the exact pattern of the written fragments in the flash message area and provide only data read from the written fragments to the decoder. Thus, the exact pattern of the written fragments needs to be recorded for each codeword written to flash memory, which may not be practical since the complexity and resources involved in recording such information after writing each codeword is tremendous. In the second scenario, the host provides not only the data fragments to be written to the flash memory, but also the unwritten gaps having bits of the default value, to the encoder. In other words, a continuous data sequence which mimics the flash message area, including the written fragments and unwritten gaps, is input to the encoder as the message section of the codeword. On read back, the host reads the entire flash message area without having to distinguish between the written fragments and unwritten gaps for the decoder. However, for a large flash message area with a small amount of written data, this method is both time and power wasting since the encoder must iterate through the large unwritten gaps to calculate the parity.
Hereinafter, the section of the message that the host writes to a fragment in the flash message area, whereby data is written continuously within the fragment but discontinuously from all other fragments, is referred to as a “data fragment”. Further, the term “gaps” not only refers to the physical space between the fragments in the flash message area, but also refers to the data stored in said physical space where all bits have the default value such as one when random errors are absent.
Therefore, it would be advantageous to have an encoder which computes the parity of a codeword by using only the data fragments as input and by asserting that the all bits in the gaps have the unwritten default value. With such an encoder, the host would only need to input the data fragments to the encoder as the message section of the codeword. On read back, the entire flash codeword area is read and input to the decoder without the need to distinguish the written fragments from the gaps.
Other than the prior-art base encoder, this invention references two more prior-art encoders, as described below.
Lin & Costello further discloses in Section 4.3, pp. 95˜pp. 98, a systematic encoder for an (n, k) cyclic code based on the parity polynomial of the code, which is referenced in the present invention and is hereinafter referred to as the prior-art “parity-polynomial-based encoder”.
U.S. Published Patent Application No. 20090077449 by Lee, entitled “Methods and Apparatus for Encoding and Decoding Cyclic Codes by Processing Multiple Symbols Per Cycle” and incorporated by reference herein, discloses a systematic cyclic code encoder which is based on the generator polynomial of the cyclic code and may be configured to process M symbols per cycle, whereby M is an integer greater than or equal to one. This encoder is hereinafter referred to as the prior-art “M-symbol-per-cycle encoder”.