Computer memory systems may be either of the persistent or non-persistent type. Examples of persistent memory types are magnetic cores, disk drives, tape drives and semiconductor Flash memories. Non-persistent memory types may be semiconductor memories such as DRAM or the like. Non-persistent memory types typically have rapid access times for both reading and writing of data and are used as computer main memory or cache memory. The data is retained in such memories by means which require a supply of power, and the information stored therein may be lost if the power is interrupted. Systems of non-persistent memory usually have a back-up power supply, which may be a capacitive storage device for short duration power interruptions, or back-up power supplies using batteries, generators, or the like for longer term data retention.
Persistent storage devices, such as disk, tape or Flash memory retain stored data even if the power source is removed from the device, and are often used to back up the non-persistent data storage devices, and for longer term data storage where the cost or reliability of providing continuous power is not practical.
Flash memory, amongst the presently used persistent memory technologies, has a lower latency than mechanical devices such as disks, but has more latency than the non-persistent memory types in current use. The price of Flash memory and similar solid state technologies has traditionally been governed by a principle known as Moore's Law, which expresses the general tendency for the capacity of a device to double, and the price to half, during an 18-month period. As such, the cost of storing data in Flash memory rather than in, for example, a disk is expected to reach parity soon.
Flash memory is a generic term, and a variety of types of solid state devices may be considered to be Flash memory. Originally there was an electronically erasable programmable read only memory (EEPROM), followed by other developments, which are known as NOR-Flash, NAND-Flash, and the like. Each of the technologies has a different design and organization and differing attributes with respect to the reading and writing of data. That is, there may be a restriction on the minimum size of a block of data that may be either read or written (e.g., data word, page, or data sector), or a difference in the time necessary to read or to write data. In many memory applications, the time for reading or writing data is not deterministic, and may vary over a wide range. In addition, the lifetime of a Flash memory device is considered to be subject to a wear-out mechanism, and is measured in read, write (also called “program” when referring to Flash memories) or erase cycles. Erase is the process of setting all of the memory cells of a block of pages of Flash memory to a state where new data can be programmed into the pages of the block. Herein, the term “write” may be used to mean “program” when a Flash memory is being used.
The failure mechanism of Flash memories may be broadly divided into a conventional failure mode which is associated with a defect in the construction which is either latent, or which develops by the passage of time, and which is considered typical of electronic components, and a wear-out mechanism. Typically, the wear-out mechanism is related to the number of times a Flash memory block is accessed for an erase operation and subsequent write operation. Although this is an electronic failure mechanism, one may consider it as more typical of a mechanical failure mechanism, such as being due to the number of miles that a car has been driven, rather than the calendar time and is manifest in a gradually increasing raw hit error rate. Both mechanisms may need to be considered in system design, and the overuse of a particular memory location avoided, by hardware or software management of the system operation so as to avoid premature wear out.
NAND Flash memory is programmed by applying a voltage to individual cells of the memory so as to store a charge value representing a value of a stored data bit. Presently, there are three varieties of such memories available in commercial quantities, termed SLC (single-level cell), MLC (multilevel cell, generally meaning that there are three voltage levels in addition to the unprogrammed state so as to store two bits of data) and TLC (three-level cell for storing three bits of data in eight voltage levels). So, there are memory cells where one, two, or three bits of information may be stored. In practice, the technology used has resulted in a situation where the lifetime decreases, the read, write and erase times increase, and the cost decreases as the number of bits stored per cell increases. There is also a trend to smaller feature sizes in the semiconductor process so as to increase the density of the storage, and this also tends to degrade performance, including lifetime, but reduce cost.
Another salient characteristic of NAND Flash is that the data is written (programmed) in contiguous pages to fill an area of memory known as a block. Individual pages of a block cannot be re-written or modified unless the entire block of data has been “erased.” Erasing the block comprises the operation of setting all of the memory cells of the block to a same state, which is usually the lowest voltage. The lifetime of each memory cell is a function, amongst other things, of the number of times that the cell has been programmed and erased. While the entire block is subject to an erase operation, the effect on the lifetime of an unprogrammed cell may be much less than the effect of a cell that is programmed to the highest voltage level.
Conventionally, the lowest voltage state need not be programmed, and represents a logical “1” in a SLC, a logical “11” in a MLC and a logical “111” in a TLC. Such cells would have the longest lifetime in a block. The highest voltage state would be associated with “0”, “00” and “000”, respectively. For illustrative purposes a SLC example is used herein as this is more straightforward to describe. Since the blocks are filled with data having varying characteristics over the lifetime of the cell, the lifetime of a cell would depend on the relative number of times a “1” is written rather than a “0”, and this is data dependent. MLC and TLC may be addressed with alterations to the coding schemes described herein, as would be apparent to a person of skill in the art having the benefit of this disclosure.
Where, for example, MLC Flash memory is used, preferentially storing “11” leads to base state, where other combinations “10”, “01” and “00” require programming to various voltages. So, when performing substitute encoding, either on learned or predetermined basis, substitute codes for the most frequently observed symbols should preferentially align the occurrence of “11” with even bits of a byte. Other optimizations may consider the combinations of characters that are found in ASCII plain text or in the compressed data so as to achieve this result. A similar approach may be used for memory cells that store three or more states per cell.
In another aspect, Flash memory cells are known to lose charge over a period of time, such that the voltage state of the cell is reduced. Over time this may lead to a bit error if the data is not periodically refreshed. By reducing the number of cells that need to be programmed in order to store a fixed amount of user data, the number of cells that are subject to error by this process is reduced, thus increasing the reliability of the memory. NAND Flash memory is used in the examples herein as it is the most commonly available solid state memory technology. However this system and method may be adapted to offer advantages when used with other solid state memories that are known to be in development such as phase change memory (PCM), ReRAM, and to be developed solid state memories. Managing the probability of bit occurrence in a word or block may be used to reduce the number of memory cells that are programmed or that may need to be reprogrammed. The changing of state of a bit or bits of a memory cell requires power and this may be a significant consideration in design. In addition, the wear characteristics of newly developed memory technologies is not as yet well characterized, and specific bit patterns or bit probabilities may be beneficial. This may also have an effect on programming or read-dependent errors. Also, the data compression achieved may be beneficial in overall memory system management and economics.
A memory system may store data received from many sources, and these sources include a variety of data types including, for example, encrypted data using various encryption techniques, compressed data using various compression techniques, and data having the characteristics of, for example, ASCII text, or other alphabet sets. The data may be structured or unstructured, and the memory system may receive the data to be stored without any hint as to the type of data that is involved. In some instances a header or other data characteristic may identify the data.