The present invention, in some embodiments thereof, relates to error correction in copy back memory operations and, more particularly, but not exclusively, to flash and like memory devices where copy back operations are common and wherein error correction is desirable to avoid accumulation of errors over a series of copy back operations.
Copy-back is the operation in which a page of data is copied from a first physical address to a second physical address, but without sending the data out of the flash die. Page copying operations are quite common in flash management systems, for example when doing garbage collection and moving a page of still-valid user data to a new location. Many flash dies on the market support commands for doing a copy-back operation—moving the data between two physical locations without spending time on sending the data out of the flash die. One would expect that flash management systems that manage flash devices having such capability would use the flash die's internal copy back operation for doing their garbage collection data movement. However, in most flash management systems this is not the case, and page copy operations are typically carried out by the following steps:
a. Reading the data from the original physical page in the flash array into the flash die data register
b. Moving the data out of the flash die over the bus connecting the flash die and the flash controller
c. Checking the data for errors, and correcting the errors if necessary
d. If correction was necessary, sending the corrected data from the controller over the bus to the flash die data register
e. Programming the data from the data register into the flash array.
The above procedure is inefficient and wastes much time. In particular it takes up precious bus cycles by moving the data over the controller-flash bus. The reason for doing the data copying in such a non efficient way is the problem of the accumulation of errors. Whenever reading a page of data from a flash dye, one must be aware of the possibility of errors accumulated in the data since it was programmed.
For this reason user data stored in a flash page is accompanied by error correction parity bits that allow the correction of errors once detected to occur. When copying a page according to the above procedure of steps a-e, any errors accumulated in the original location of the data are corrected in step “c” and the data is reset again to its originally correct version. However, if the die's internal copy-back method is used for copying the data, no error correction occurs. If it so happens that the copied data has already accumulated errors prior to being copied, then the version of the data in the new location starts its life with those errors included. Later, when the data is moved again by the flash management software, the process repeats itself—the data now contains both the errors with which it was first programmed as well as any new errors that might have accumulated in the second location. There is no limit to this accumulation of errors as long as the data remains valid, that is, the data is not deleted or over-written. At some point the number of errors may exceed the capability of the error correction mechanism to correct them, at which point the data becomes corrupted and may be lost for its owner.
The consequence of the above is that relying on internal die copy-back operations when moving data between physical locations within the flash array is dangerous and may end up with irretrievable data loss. This is the reason flash management systems usually do not utilize the internal copy-back option.
It would thus be beneficial if one could find a way of taking advantage of die internal copy-back capability during flash management data copying in a way that provides the inherent time saving of copy-back but without risking reliability and integrity of the data.
We note that the above discussion ignores the problem of control fields associated with user data and stored with it in the same page. Such control fields sometimes depend on the exact physical address of the data and therefore change when the data is moved between two physical addresses. This complication means that in such flash management systems a simple copy-back implementation of the data move is impossible not only because of the accumulation of errors but also because changing of the control fields may be a part of the process, which the simple copy-back cannot do. However, if updating of the control fields had been the only issue with using copy-back, it could have been resolved by adding a step of updating the control fields by the flash controller executing the flash management algorithms while the data remains in the data register of the flash die. Most flash dies that support copy-back also support such in-register updating before programming the data to its destination. Additionally, in many flash management systems it remains a fact that even though some control fields depend on the physical address of the data, many, if not most, control fields do not. Therefore it is possible to use efficient copy-back for many page copy operations, even though some page copy operations require the less efficient copy procedure described above. An example of a flash management algorithm that allows page-copy operations to be used for some of its page copying operations, if not for the error accumulation problem, is U.S. Pat. No. 6,678,785.
In the following, we ignore the issue of control fields update as the control fields issues can be solved as described above.
The prior art flash management systems offer a choice only between the options of either not using copy-back commands and wasting bus transfer time, or using copy-back commands and risking data loss.