The present invention relates generally to the field of flash memory devices. More particularly, the present invention relates to managing and operation of copying data stored temporarily in a cache of a flash memory device into a new location in the flash memory device, an operation commonly known as “cache flushing”.
Herein the terms “memory” and “storage” are used interchangeably and have the same meaning. Consequently, compound phrases containing those two terms (such as “memory device” and “storage device”, or “memory system” and “storage system”) also have the same meaning.
In the present invention the terms “controlling mechanism” and “controller” are used interchangeably and have the same meaning.
Herein, the term “block” is defined as the smallest unit of a flash memory that can be erased in a single operation. The term “page” is defined herein as the smallest unit of a flash memory that can be written (also called “programmed”) in a single operation. Typically, a block contains many pages.
Herein, the terms “flash management system” and “flash file system” are synonyms and are used interchangeably. Each of these terms refers to a software module that manages the storage of data in a flash memory device, regardless of whether the interface exported by the module is file-oriented (with commands like “open file” or “write file”) or block-oriented (with commands like “read block” or “write block”), and regardless of whether the software module runs on a controller dedicated solely to flash management or on the same host computer as that on which the applications that use the storage system are running.
A flash memory system implemented as a Multi-Level Cell (MLC) flash memory is provided for storing more than one bit of data in each memory cell. The writing of data into an MLC flash memory is typically slower than the writing of data into a Single-Level Cell (SLC) flash memory that stores only one bit of data per cell. Therefore, a storage system based on an MLC flash memory might not be capable of recording a stream of incoming data transmitted to the storage system at a high writing rate.
Typically in cases in which data are produced at a rate too high to be directly stored, a cache memory mechanism is provided, and is designed to operate fast enough to handle the incoming data stream. A cache memory utilizing a second (and faster) memory is implemented between the input data source and the main (and is slower) memory of the flash memory device. The input data stream is first written into the faster cache memory, and at a later stage is copied from this faster cache memory into the main memory. As the copying operation between the cache memory and the main memory is typically performed in the background, this operation does not have to meet the strict performance conditions imposed by the input data stream rate, and therefore the lower write performance of the main memory is no longer an obstacle.
However, the implementation of a second memory for caching has its drawbacks. Such an implementation requires additional components for the cache memory and its control, thereby complicating the design and management of the memory system.
The prior art includes U.S. Pat. No. 5,930,167 to Lee et al., which teaches a memory method and system for caching write operations in a flash memory storage system while achieving the benefits of caching in MLC flash memories but with fewer of the disadvantages. The MLC flash memory medium of the Lee patent is configured to operate as its own cache memory. This is possible because memory cells that store multiple bits can be configured to also operate similar to SLC memory cells and store only a single bit each, which is an easier task from a technological point of view. As a result, the MLC memory cells can be implemented to achieve the faster write performance that characterizes an SLC flash memory. This Lee et al. patent is incorporated by reference for all purposes as if fully set forth herein.
Techniques known in the art, such as that disclosed in the Lee et al. patent, provide a “built-in” faster cache memory embedded within the MLC flash memory storage system. When data bits are received for storage, these bits are first written into memory cells that are set to operate in SLC mode. This first writing operation can be is done relatively fast. Following this operation, in the background and when time permits it, the data bits are copied from the SLC cells into memory cells that are set to operate in MLC mode. Thus, as the system is designed to employ the higher storage density of the MLC flash memory storage system, the system further handles the faster input stream that could not be handled without the cache memory mechanism.
There are two ways to configure a flash memory system that utilizes such an SLC caching scheme:                A. A dedicated cache method: a specific portion of the memory cells is always allocated to operate in SLC mode, while other cells are allocated to operate in MLC mode only. In other words, while memory cells operating in SLC mode (SLC cells) and memory cells operating in MLC mode (MLC cells) co-exist within the storage system at the same time, each specific memory cell is allocated to either always operate in SLC mode or to always operate in MLC mode, and cannot be alternately allocated to operate in SLC mode at one point in time and in MLC mode at another point in time.        B. A mixed cache method: at least some of the memory cells change modes during the system's operation. That is—a specific memory cell is allocated to operate in SLC mode at one point in time and utilized for caching data, while at another point in time the same memory cell is allocated to operate in MLC mode and utilized for high density data storage in the main memory.        
Whenever there is a reference herein to a cache, the cache can be either a mixed cache or a dedicated cache.
The dedicated cache method is much simpler to manage in flash memory systems than the mixed cache method. Each portion of the memory cells is pre-allocated to operate either in SLC mode or in MLC mode. Therefore, no real-time mode switching is required. Furthermore, there is no need to provide an information management module for storing and detecting the current operation mode of any memory portion.
The Lee et al. patent discloses a cache implementation that uses the mixed cache method. US Patent Application Publication No. 2007/0061502 to Lasser discloses a cache implementation that uses the dedicated cache method. The Lasser Application is incorporated by reference for all purposes as if fully set forth herein.
However, both cache implementation methods (i.e. the mixed cache method and the dedicated cache method) suffer from disadvantages explained below.
As explained above, the way a cache memory in a flash memory operates is the following—incoming data are written into the faster-to-write cache storage locations. Later, either when there is idle time or when the cache memory is fill and free space must be cleared, the data are read out of the cache memory and written into the slower-to-write main storage locations. After the data have been copied, the data no longer need to be stored in the cache memory and can be deleted so as to make the space occupied by the data available for new incoming data.
The operation of copying data from the cache memory into the main storage area and then clearing the copied area in the cache memory is referred to herein as “cache flushing”. Cache flushing is typically a relatively time-consuming operation, as this operation includes both the writing of data into the slow-to-write MLC main memory area and the erasing of the copied data from the cache memory area, both operations typically being much slower than reading or even writing data into the SLC cache. But there is an even more important reason for any occurrence of cache flushing to be a long operation—for reasons having to do with the efficiency of the algorithms of cache management it is most cost-effective for the overall performance and efficiency of the storage device to carry out cache flushing tasks a full block at a time. In other words, in prior art systems whenever the flash management system decides to allocate time to cache flushing (moving data from cache to non-cache storage area) the flash management system fills at least a full block of non-cache storage at a time.
The operation of cache flushing in prior art flash management systems will be better understood by referring to FIG. 1, which is a high level flow-chart of flash management operation according to the prior art. In block 10, the flash management system checks whether a service request from the host is pending. If such a request is pending, the flash management system checks in block 12 whether the service can be provided without cache flushing. For example, a service request to store new data can be serviced only if there is room in the cache to receive the new data. If the service can be provided without cache flushing, the service is provided in block 14 and the operation returns to block 10. If such a request is not pending, the flash management system checks in block 16 whether there is useful cache flushing to be done. If there is no useful caching to be done, the operation returns to block 10. If there is useful cache flushing to be done (e.g. moving data from cache to main storage during idle time) or if the service requested by the host requires prior cache flushing (e.g. if the cache is too full to receive new data), a memory block is selected for cache flushing in block 18 and cache flushing is performed for the whole block in block 20. As can be seen from FIG. 1, once cache flushing has been started for a block, the cache flushing is brought to completion before any further host request is serviced, “starving” the host as long as the cache flushing operation of the current block is in progress.
The advantage of doing cache flushing using large “quanta” the size of a block is easy to understand—during the filling of a block by copying data into the block, the memory system is in a vulnerable state—the same data appear more than once within the storage space, data of adjacent pages are located in different types of storage cells, management tables pointing to where each page of data is to be found are not updated yet and do not correctly reflect the instantaneous state of the storage system, etc. Therefore, designers of prior art flash memory systems always have preferred to complete such vulnerable transitions atomically, and not allow other tasks to interfere with them by running concurrently. For example, new input data arriving from the host are not accepted and are not stored in the flash memory while a cache flushing task is taking place. Either the new data are held in a RAM buffer or the host is signaled to wait for “permission” to send in new data, or some other similar arrangement suitable for the architecture of the specific flash memory system is implemented.
This design decision, of treating the cache flushing into a block (or in some systems some integral number of blocks) as an atomic uninterruptible operation, has its price in the behavior of the memory system. As indicated above, the filling of a block of flash memory is slow compared to reading out of such memory. If for example a block contains 128 pages, each page having an average programming (writing) time of 750 microseconds, then writing a full block consumes at least 96 milliseconds. Taking also into account the time for reading those 128 pages from memory (either from the cache or from other storage area) and some overhead of the managing software, one gets to a cache flushing time for a block of well over 100 milliseconds.
The effect of this is that in systems such as the system of the above example once in a while the storage system might have a “hiccup” of over 100 milliseconds while completing a cache flushing task. The host computer accessing the storage system is held back for that period of time, until allowed to continue with sending a flow of new data into storage. Depending on the application utilizing the storage, such hiccups may be observable to a human user or may go unnoticed. Most appliances or host computers are well-prepared for such hiccups and know how to handle them without causing a noticeable disturbance. One common technique is the use of data buffering in the host memory, accumulating new data until the hiccup ends and the buffered data may be sent out into the storage device. This is why most users are not aware of the cache flushing issues—the appliances they operate are designed to shield them from users' view. But even if the appliance does not do a good job in shielding the hiccup, a hiccup of 100 milliseconds is short enough to be almost unnoticed by a human user.
However, in recent years the problem of cache flushing hiccups in flash management systems has become more severe and more difficult to be dismissed as just a slight inconvenience in the worst case. This is caused by two developments:                A. Cache flushing time has become longer than we were used to before. One reason is that in new generations of flash devices the size of a block has become larger. If just a few years ago a typical block of NAND flash memory was 16 Kbytes, now one can find flash devices having blocks of 512 MBytes and even more. Additionally, the programming time of a page also has becoming slower. This is caused by the effects of shrinking device technology to smaller geometries, and also by moving to devices storing multiple bits per cell. The more bits stored per cell, the longer the programming time becomes. Therefore the time to fill a block of flash memory in modern-day flash memory devices is longer than it used to be.        B. Some of the most popular communication protocols employed in memory cards impose a strict upper limit on response time to some commands. For example, the popular SecureDigital (SD) standard requires an SD-compliant card to always respond to a host write command within no more than 250 milliseconds. If a card does not meet this strict timeout, a host might terminate the communication session with the card and abort the data storage operation.        
Obviously, the combined effect of the above developments creates a difficulty for flash memory systems designers. A cache flushing hiccup must not be longer than a protocol-imposed timeout limit, while the minimum time for a cache flushing task gets dangerously close to the limit. The problem is faced mainly by designers of flash management software of memory cards controllers, but it is also of importance for designers of flash controllers that do not necessarily operate inside memory cards, and even for designers of flash memory management software running on a host computer as part of a software driver for a flash memory device. In those cases in which there is no hard limit imposed by a communication protocol, the problem is not a malfunction of the system but a disturbing effect that is noticeable to the user.
Therefore, it is desirable to provide a storage system whose flash management system performs efficient cache flushing, while avoiding the hiccup problems resulting from the relatively long time of cache flushing common in the prior art techniques.