1. Field of the Invention
The present invention relates to a system and method for improving the reliability of storing data on non-volatile memory devices.
2. Description of the Related Art
Almost all computer systems, whether large mainframes or tiny embedded micro controllers, need to store data such that it shall not be lost when the system is powered down. Therefore those computers usually include some kind of Non Volatile Memory (NVM), in addition to any volatile memory they may use for running their programs. The NVM may be a magnetic disk, a flash memory chip, or any other non-volatile storage element.
FIG. 1 shows the general structure of accessing each such storage device. At the bottom of the figure, we see the physical storage media 10, which is the hardware layer implementing the physical storage. As each storage device may have its own unique interface and peculiarities which make it very inconvenient to work with, it is the common practice to have a software device driver included in the operating system running on the computer (or running on the bare hardware, if no operating system is used), with this driver providing a simplified and standardized interface for other software components wishing to access the device. For storage devices used for storing files (i.e. disks, diskettes, etc.), but not only for them, the interface provided by their device drivers is usually of the type known as “block device driver”. Such device drivers interact with their clients using blocks of data rather than single bytes. This applies to both input and output operations, that is, to both reading and writing. The most common example for a block device is the magnetic disk, whose hardware interface is commonly configured for transferring only complete blocks (usually called “sectors” in this context), such as 512 bytes or more. It should be emphasized that it is not necessary for the physical storage device to be physically limited to block operations in order to have a device driver presenting a block device interface. For example, a battery-backed RAM disk is not physically limited to blocks and may physically read and write each of its memory bytes. Still, typically its device driver presents a block device interface to the rest of the system, so as to be compatible and interchangeable with magnetic disks. Therefore, for the purpose of the present invention, a block device is any device whose driver presents a block device interface, regardless of its actual physical structure.
A block device seems to its users as a linear array of blocks of a certain fixed size. Each one of this blocks can be read or written independently of the other blocks using its index in the array, as shown in FIG. 2. The common practice (which is also used here) is to number the blocks starting from block number 0 (21), and ending in block number (N−1) 22, where N is the number of blocks exported by the device driver. Again it should be emphasized that this linear array structure does not necessarily exist at the physical device level. For example, a flash disk block device driver also presents this linear array image, but internally the physical blocks on the media are usually scattered in a random order (such that block number 0 may physically be located in the middle or the end) due to the writing limitations in flash memory and the possible existence of bad blocks. It should also be understood that the block device driver has no knowledge of the contents put into its blocks by the upper software layers.
Referring back to FIG. 1, we see there is usually a File System (FS) software layer on top of the device driver. A FS is a software component which provides further insulation from the physical device, by enabling the application programs to interact with the storage device using only the concept of files, a concept which is much more natural and convenient to the typical programmer or user. The FS achieves this abstraction by organizing the user data on the block device into some logical structure, and associating the blocks containing a file's data with the file's attributes (i.e. file name, creation time, access permissions, etc.). For that purpose the FS stores into the device meta-data, which is not directly visible to the user, and contains the FS internal book-keeping information with which it is able to trace and access the user files. For example, the Microsoft DOS FAT12 file system, which is one of the simplest FS commercially available, stores on the storage device a boot sector containing some basic parameters, allowing the location of the other meta-data structures (must be in first device block), one or more copies of the File Allocation Table (FAT), which is the allocation map of the device, and a root directory structure for locating files by name. The application programs interact with the FS on the file-level, by issuing commands such as “open file”, “delete file”, “write file”, etc., being completely ignorant of the underlying block structure. There are many file systems in use today, greatly differing in their internal structures and characteristics. In many cases (such as with the Linux operating system) an operating system even provides several file systems to its users and they may choose the one most suitable for their needs.
A FS may or may not be “ruggedized”. For the purpose of this invention, a ruggedized software component is defined as any component having the capability of staying in a certain known consistent state (for file systems and device drivers, “state” refers to data contents of the storage device) until a sequence of operations is completed and a new known consistent state is reached. A ruggedized component guarantees that any interruption (such as a sudden power-loss) before reaching the new consistent state will cause a “roll-back” of the operations which occurred after the previous consistent state, leaving the component in this first state. In other words, a user session may end in a new consistent state or in a previous consistent state, but never in between. In still other words, a ruggedized component is a component that can survive any sudden power loss without losing its consistency, always waking up into a consistent state.
In non-ruggedized systems, a power loss occurring in the middle of any FS operation may easily destroy the FS consistency, unless special measures are taken to protect against this. The loss of consistency can occur at two levels:                a. Inconsistency at the block device level—Let us assume the FS issues a command to the device driver to overwrite an existing block with new data. The block write operation of that driver might not be atomic. That is, it might be the case that a power loss in the middle of writing the block will leave the block half written, with part of it containing old data and part of it containing new data. Both fully old or fully new data are considered to be consistent states, but the mix of the two is not consistent, as it leaves the FS in a state which is neither the old one (before the write) nor the new one (after a successful write).         Methods for solving this type of inconsistency are well known in the prior art, and are not part of the present invention. For example, M-Systems Flash Disk Pioneers Ltd. TrueFFS driver for its DiskOnChip family of products offers its users protection against such inconsistencies at the block device level.        b. Inconsistency at the FS level—Let us assume that the user issues a command to the FS to write a new file. Because of the need of the FS to update its own meta-data to reflect the change, the FS will most probably have to issue to the device driver several commands—a first one to actually write the new data, a second one to update the allocation map, and a third one to update the corresponding directory entry. In many file systems there might be even more commands, such as for updating backup copies of the FS structures. This sequence of calls is not atomic even if each single call is. That is, it might be possible that a power loss within the sequence of calls will enable only a few of them to be completed, while others will not take place. For example, the file might actually be written into the device, but its directory entry may not be written, so that it might now be impossible to locate it. A more severe danger occurs when overwriting an existing file with new contents, where it might happen that the previous contents are already lost while the new contents have not yet been written, a situation which is highly undesirable. An even more severe danger occurs when the integrity of the FS structures themselves are damaged, as we might even lose the whole device contents if the FS designer did not anticipate such cases. It is this type of inconsistency which is the object of the present invention.        
As there are many systems in which losing the FS consistency is unacceptable, there have been many attempts to find defenses against power loss inconsistencies, or in other words—to provide ruggedized file systems. Traditional solutions have been to have the system from time to time copy the FS or portions of it into an offline media such as backup tape. If a failure occurs, the FS can be retrieved from the backup copy into a consistent state. This method usually requires manual intervention and/or bringing the whole system offline when making the backup or restoring from it. A better solution was implemented by some file systems (for example the Episode file system disclosed in USENIX, Winter 1992, pages 43-59), which do not backup the whole device contents but rather only the meta-data of the FS, and can do it online without halting system operation and without using offline storage, just some additional storage space on the protected device. An even more efficient solution is disclosed by Hitz et al in U.S. Pat. Nos. 5,819,292 and 5,963,962. The methods of these patents allow achieving the same result without having to duplicate all meta-data of the FS. Those methods are described in great detail in the patents in the context of a FS called “Write Anywhere File-system Layout” (WAFL), and are quite efficient in achieving the goal of having a ruggedized FS with a relatively low penalty in storage space and performance.
However, all methods known in the prior art for achieving file system ruggedness are based on implementing special data structures and algorithms at the file system level. For example, the Hitz et al methods require the implementation of special “snapshot” bits per each block entry in the FS allocation table (called “blkmap” there), and also require changing the logic of file deletion, so that blocks used for storing the file are not necessarily cleared, as they might still be needed for a previous “snapshot”. The disadvantages resulting from this are:                a. A ruggedized FS is currently always a specially designed one, having unique algorithms and data structures. Thus a user requiring the ruggedness property has no choice but to stay with that FS, even if another FS better suits his/her needs in other respects. Making this other preferred FS ruggedized will typically require a total redesign of its internal workings, something which is usually not practical.        b. Because of the unique meta-data structures employed by prior art ruggedized FS, there might be no compatibility with any non-ruggedized FS, in the sense that if a storage device is moved from a system with the ruggedized FS into a system with a non-ruggedized FS or vice versa, the device contents might be interpreted differently, or even not be readable at all. Such compatibility is highly desired as in many cases storage devices which must operate under ruggedized conditions have to be set-up or maintained on factory-level development systems which have no need for ruggedness and consequently do not support it.        c. Prior art ruggedized file systems typically employ their special algorithms all the time. As there is always some penalty in space and performance when using a ruggedized FS, it might happen that an application will sometimes prefer to give up ruggedness for certain operations or for certain periods of time. As such an option is typically not available, an application requiring ruggedness only for a short period of time must pay the penalty all the time.        
There is thus a widely recognized need for, and it would be highly advantageous to have, a ruggedized File System that is not limited by requiring special design, unique meta-data structures and special algorithms.