Flash memory is one type of non-volatile, rewritable memory commonly used in many types of electronic devices, such as USB drives, digital cameras, mobile phones, and memory cards. Flash memory stores information in an array of memory cells made from floating-gate transistors. In traditional single-level cell (SLC) devices, each cell stores only one bit of information. Some newer flash memory, known as multi-level cell (MLC) devices, can store more than one bit per cell by choosing between multiple levels of electrical charge to apply to the floating gates of its cells.
A NAND memory is accessed by a host system much like a block device such as a hard disk or a memory card. Typically, the host system performs reads and writes to logical block addresses. The NAND memory is divided into blocks and each block is organized into pages or sectors of cells. Blocks may be typically 16 KB in size, while pages may be typically 512 or 2,048 or 4,096 bytes in size. Multi-level NAND cells makes management of NAND devices more difficult, particularly in multithreaded real-time run-time environments.
In response, manufacturers have encapsulated NAND flash as memory devices in which a controller is placed in front of a raw NAND memory. The purpose of the controller is to manage the underlying physical characteristics of the NAND memory and to provide a logical to physical mapping between logical block numbers and physical locations in the NAND memory, which are being accessed by a host system.
Reading and writing are asymmetric behaviors in NAND memories. To read a particular physical block, the address is programmed, and the operation started. After an access time, the data is available. This process of reading blocks can be repeated ad infinitum (ignoring certain NAND disturb phenomenon). Writing blocks is an asymmetric operation because a given block can only be written with data essentially only one time and so is not repeatable like a read.
The initial condition of a NAND cell is to store a logical ‘1’. To write a data value, wherever there is to be a ‘0’, the data is written and the ‘1’ states are left alone. While it may be possible to continue to overwrite ‘1’ states with ‘0’ states, this is not generally useful. To completely enable the overwriting of a block, the initial condition must be again established. This operation is referred to as an erase cycle.
Using currently available NAND devices as an example, typical read access times are in the range of 25-50 microseconds, write cycle times are in the range of 200-700 microseconds, and erase cycle times are in the range of 2,000-3,000 microseconds. Clearly there is a tremendous variance in performance, depending on the exact circumstances.
In order to mitigate the vast difference between erase and read cycle times, write blocks are grouped together into erase blocks so that the time to erase is amortized over many write blocks, effectively reducing the erase time on a per page basis. In addition, generally more read operations can be performed on a block than erase/write cycle pairs. While there are technological subtleties, generally reads are non-destructive. Because of the nature of the charge storage on the floating gates, erase/write cycle pairs tend to damage the storage cells due to trapped charge in the oxide. For this reason, erase/write cycle pairs should be algorithmically avoided, or when inevitable should be balanced across all blocks. This later mechanism is referred to as “wear leveling”.
Because of the impracticality of overwriting data (both because of the wear mechanism and erase block grouping), various techniques are used to virtualize the location of any given logical block. Within the current state of the art is what is called a file translation layer (FTL). This is a driver level software layer which maintains temporary and permanent tables of the mapping between a given logical block number and its physical location in the media. By presenting a logical block device to upper layers of software, any number of file systems may be implemented. Alternatively, a journaling file system may be implemented using the linear array of blocks. Here the blocks are allocated in order of need and the device block allocation is managed as (essentially) a large circular buffer.
As alluded to above, data on NAND devices can be written in a unit of one page, but an erase is performed in the unit of one block. A page can be written only if it is erased, and a block erase will clear the data on its pages. Because a NAND device is write-once, pages are allocated in a block until all the pages in the block are used. Regardless of the specific implementation, obsolete or “overwritten” data in the NAND array is not truly overwritten but simply marked by the number of mechanisms as simply being obsolete or stale. Logically, a block that contains live data is called a valid block, and an “obsolete” block is one that contains obsolete or stale data. If a file is written too many times, for example, it may result in many obsolete blocks in the NAND array.
When all (or nearly all) blocks contain data, blocks that have been written earlier may possibly contain stale data and therefore invalid data. When the NAND device is full or almost full, it becomes necessary to remove the stale data and efficiently pack the remaining valid data to make room in the NAND device. This process is referred to as “garbage collection”.
FIG. 1 is a block diagram illustrating a conventional garbage collection on a NAND device. The garbage collection process on a NAND device 10 includes a pre-collection phase 12 and post collection phase 14. During the pre-collection phase 12, all the blocks to be erased, called erase blocks, are examined. Blocks that are stale are available. Blocks that are not stale must be made stale by moving their data, i.e., rewriting the data into a new area. Erase blocks to be erased in a group comprise an erase cluster 16. In this example, the erase cluster 16 includes three valid blocks and one obsolete block 18. The valid blocks are being moved to respective blocks in free cluster 20. For this reason, garbage collection is not done when the NAND device 10 is truly full, but is instead done when the block allocation crosses some threshold determined by file translation management requirements. After all blocks are made stale in the erase cluster 16, the blocks are erased and made available during post collection 14, resulting in free cluster 22. The new beginning of the log 24 is the end of the free cluster 22, and the new end of the log 26 is that last block that was moved.
Because garbage collecting an erase block involves read-then-write operations—first the block must be read to determine its current state and may involve data movement (i.e., writing good data elsewhere to make the current block stale) it can be quite time consuming to perform. The garbage collection time is the sum of the erase time, the summation of the rewritten block and the summation of the other reads necessary to determine the block state. If erase blocks are garbage collected in groups/clusters as shown in FIG. 1, this erase time is yet again increased proportional to the number of blocks being garbage collected.
Because it is not necessarily predictable to an application, operating system (OS) or a file system when a block driver needs to perform garbage collection, any throughput analysis must be able to tolerate a reasonably large asynchronous interruption in performance for the above described garbage collection. This is particularly true because in conventional systems, garbage collection is likely to be delayed until it becomes necessary.
For a single threaded application, such as in a digital still camera, NAND performance can be optimized according to the usage model, and with currently available products in the memory category (e.g. Compact Flash and SD Card) often are. The camera usage model is to: 1) format a flash card; 2) take a picture, writing the data to the card as fast as possible (to minimize click-to-click time); 3) view random pictures to perform edits (e.g. deletion of unwanted pictures); and 4) mass transfer of picture files to another host (such as a desktop or laptop computer). Only steps 2) and 4) have real time performance requirements, and the usage of the storage is highly focused. When writing a new picture to the NAND device, all the NAND device has to do is be able to sustain sufficiently high write bandwidths. Conversely, when the NAND device has to read picture files to transfer to a host, all the NAND device is required to do is sustain sufficiently high read bandwidths.
However, on more complex platforms where there may be multiple streams being read and written to the NAND device, and each stream may have its own characteristics including real-time requirements. Therefore, optimization is not nearly so simple because there are conflicting requirements.
Consider as an example, a multithreaded environment in which two software applications are processing three file streams. One application is recording a real-time media stream (either video or audio) onto the NAND device, while the same application is also playing back either the same or a different media stream. (If it is playing back the same media stream, it is playing back at an earlier time point in the stream.) Assume that the second application is an e-mail client that is receiving e-mail updates over an internet connection and synchronizing the in-box.
In this example, these two applications have different real-time requirements. The media streaming performed by the first application cannot be halted, whereas the e-mail synchronization performed by the second application has no a priori timing requirement. If the media stream write overflows, data will be lost. If the media stream read underflows, there will be annoying gaps in the video or audio playback. If there are delays in the e-mail synchronization, however, the performance will be affected, but since this is demand driven, there is no loss of data.
Typically, media streams are taken from some kind of media source (e.g., over-the-air modem or stored media) at a constant packet rate. These packets may be stored into a ping-pong buffer to make the system resilient to variable latencies in some operations. Media stream data is written into the ping buffer until it is full, then it is written into the pong buffer. When the ping buffer is full, it is read out and passed along to the next stage in the processing pipeline (e.g., the buffer is emptied by software which stores the data onto the NAND device). If the pong buffer is not empty by a consumer by the time the producer is finished loading the ping buffer, there is an overflow situation. If the consumer needs the ping buffer before the ping buffer has been filled, there is an underflow situation.
Large asynchronous garbage collection operations of memory devices may complicate the real-time needs real-time applications, such as in the media stream example. Garbage collection represents a worst case deviation in the typical write access times to memory devices, and this deviation can be extreme when compared to the typical result. The above scheme of using ping/pong buffers can accommodate large and variable latencies only if these latencies are bounded, and these buffers can do so at the expense of becoming very large. This places an additional burden on the platform in that it now requires very large media buffers in order to accommodate an operating condition that is rare.
Memory devices lack an overall context to globally optimize the garbage collection process because memory devices do not have knowledge of the semantics of a given block operation. Accordingly, what would be desirable is a solution which balances the need for NAND management of garbage collection with the needs of applications having different real-time media requirements.