1. Field of the Invention
The present invention relates to a system and method for improving the reliability of storing data on non-volatile memory devices.
2. Description of the Related Art
Almost all computer systems, whether large mainframes or tiny embedded micro controllers, need to store data such that it shall not be lost when the system is powered down. Therefore those computers usually include some kind of Non Volatile Memory (NVM), in addition to any volatile memory they may use for running their programs. The NVM may be a magnetic disk, a flash memory chip, or any other non-volatile storage element.
FIG. 1 shows the general structure of accessing each such storage device. At the bottom of the figure, we see the physical storage media 10, which is the hardware layer implementing the physical storage. As each storage device may have its own unique interface and peculiarities which make it very inconvenient to work with, it is the common practice to have a software device driver included in the operating system running on the computer (or running on the bare hardware, if no operating system is used), with this driver providing a simplified and standardized interface for other software components wishing to access the device. For storage devices used for storing files (i.e. disks, diskettes, etc.), but not only for them, the interface provided by their device drivers is usually of the type known as xe2x80x9cblock device driverxe2x80x9d. Such device drivers interact with their clients using blocks of data rather than single bytes. This applies to both input and output operations, that is, to both reading and writing. The most common example for a block device is the magnetic disk, whose hardware interface is commonly configured for transferring only complete blocks (usually called xe2x80x9csectorsxe2x80x9d in this context), such as 512 bytes or more. It should be emphasized that it is not necessary for the physical storage device to be physically limited to block operations in order to have a device driver presenting a block device interface. For example, a battery-backed RAM disk is not physically limited to blocks and may physically read and write each of its memory bytes. Still, typically its device driver presents a block device interface to the rest of the system, so as to be compatible and interchangeable with magnetic disks. Therefore, for the purpose of the present invention, a block device is any device whose driver presents a block device interface, regardless of its actual physical structure.
A block device seems to its users as a linear array of blocks of a certain fixed size. Each one of this blocks can be read or written independently of the other blocks using its index in the array, as shown in FIG. 2. The common practice (which is also used here) is to number the blocks starting from block number 0 (21), and ending in block number (Nxe2x88x921) 22, where N is the number of blocks exported by the device driver. Again it should be emphasized that this linear array structure does not necessarily exist at the physical device level. For example, a flash disk block device driver also presents this linear array image, but internally the physical blocks on the media are usually scattered in a random order (such that block number 0 may physically be located in the middle or the end) due to the writing limitations in flash memory and the possible existence of bad blocks. It should also be understood that the block device driver has no knowledge of the contents put into its blocks by the upper software layers.
Referring back to FIG. 1, we see there is usually a File System (FS) software layer on top of the device driver. A FS is a software component which provides further insulation from the physical device, by enabling the application programs to interact with the storage device using only the concept of files, a concept which is much more natural and convenient to the typical programmer or user. The FS achieves this abstraction by organizing the user data on the block device into some logical structure, and associating the blocks containing a file""s data with the file""s attributes (i.e. file name, creation time, access permissions, etc.). For that purpose the FS stores into the device meta-data, which is not directly visible to the user, and contains the FS internal book-keeping information with which it is able to trace and access the user files. For example, the Microsoft DOS FAT12 file system, which is one of the simplest FS commercially available, stores on the storage device a boot sector containing some basic parameters, allowing the location of the other meta-data structures (must be in first device block), one or more copies of the File Allocation Table (FAT), which is the allocation map of the device, and a root directory structure for locating files by name. The application programs interact with the FS on the file-level, by issuing commands such as xe2x80x9copen filexe2x80x9d, xe2x80x9cdelete filexe2x80x9d, xe2x80x9cwrite filexe2x80x9d, etc., being completely ignorant of the underlying block structure. There are many file systems in use today, greatly differing in their internal structures and characteristics. In many cases (such as with the Linux operating system) an operating system even provides several file systems to its users and they may choose the one most suitable for their needs.
A FS may or may not be xe2x80x9cruggedizedxe2x80x9d. For the purpose of this invention, a ruggedized software component is defined as any component having the capability of staying in a certain known consistent state (for file systems and device drivers, xe2x80x9cstatexe2x80x9d refers to data contents of the storage device) until a sequence of operations is completed and a new known consistent state is reached. A ruggedized component guarantees that any interruption (such as a sudden power-loss) before reaching the new consistent state will cause a xe2x80x9croll-backxe2x80x9d of the operations which occurred after the previous consistent state, leaving the component in this first state. In other words, a user session may end in a new consistent state or in a previous consistent state, but never in between. In still other words, a ruggedized component is a component that can survive any sudden power loss without losing its consistency, always waking up into a consistent state.
In non-ruggedized systems, a power loss occurring in the middle of any FS operation may easily destroy the FS consistency, unless special measures are taken to protect against this. The loss of consistency can occur at two levels:
a. Inconsistency at the block device levelxe2x80x94Let us assume the FS issues a command to the device driver to overwrite an existing block with new data. The block write operation of that driver might not be atomic. That is, it might be the case that a power loss in the middle of writing the block will leave the block half written, with part of it containing old data and part of it containing new data. Both fully old or fully new data are considered to be consistent states, but the mix of the two is not consistent, as it leaves the FS in a state which is neither the old one (before the write) nor the new one (after a successful write).
xe2x80x83Methods for solving this type of inconsistency are well known in the prior art, and are not part of the present invention. For example, M-Systems Flash Disk Pioneers Ltd. TrueFFS driver for its DiskOnChip family of products offers its users protection against such inconsistencies at the block device level.
b. Inconsistency at the FS levelxe2x80x94Let us assume that the user issues a command to the FS to write a new file. Because of the need of the FS to update its own meta-data to reflect the change, the FS will most probably have to issue to the device driver several commandsxe2x80x94a first one to actually write the new data, a second one to update the allocation map, and a third one to update the corresponding directory entry. In many file systems there might be even more commands, such as for updating backup copies of the FS structures. This sequence of calls is not atomic even if each single call is. That is, it might be possible that a power loss within the sequence of calls will enable only a few of them to be completed, while others will not take place. For example, the file might actually be written into the device, but its directory entry may not be written, so that it might now be impossible to locate it. A more severe danger occurs when overwriting an existing file with new contents, where it might happen that the previous contents are already lost while the new contents have not yet been written, a situation which is highly undesirable. An even more severe danger occurs when the integrity of the FS structures themselves are damaged, as we might even lose the whole device contents if the FS designer did not anticipate such cases. It is this type of inconsistency which is the object of the present invention.
As there are many systems in which losing the FS consistency is unacceptable, there have been many attempts to find defenses against power loss inconsistencies, or in other wordsxe2x80x94to provide ruggedized file systems. Traditional solutions have been to have the system from time to time copy the FS or portions of it into an offline media such as backup tape. If a failure occurs, the FS can be retrieved from the backup copy into a consistent state. This method usually requires manual intervention and/or bringing the whole system offline when making the backup or restoring from it. A better solution was implemented by some file systems (for example the Episode file system disclosed in USENIX, Winter 1992, pages 43-59), which do not backup the whole device contents but rather only the meta-data of the FS, and can do it online without halting system operation and without using offline storage, just some additional storage space on the protected device. An even more efficient solution is disclosed by Hitz et al in U.S. Pat. Nos. 5,819,292 and 5,963,962. The methods of these patents allow achieving the same result without having to duplicate all meta-data of the FS. Those methods are described in great detail in the patents in the context of a FS called xe2x80x9cWrite Anywhere File-system Layoutxe2x80x9d (WAFL), and are quite efficient in achieving the goal of having a ruggedized FS with a relatively low penalty in storage space and performance.
However, all methods known in the prior art for achieving file system ruggedness are based on implementing special data structures and algorithms at the file system level. For example, the Hitz et al methods require the implementation of special xe2x80x9csnapshotxe2x80x9d bits per each block entry in the FS allocation table (called xe2x80x9cblkmapxe2x80x9d there), and also require changing the logic of file deletion, so that blocks used for storing the file are not necessarily cleared, as they might still be needed for a previous xe2x80x9csnapshotxe2x80x9d. The disadvantages resulting from this are:
a. A ruggedized FS is currently always a specially designed one, having unique algorithms and data structures. Thus a user requiring the ruggedness property has no choice but to stay with that FS, even if another FS better suits his/her needs in other respects. Making this other preferred FS ruggedized will typically require a total redesign of its internal workings, something which is usually not practical.
b. Because of the unique meta-data structures employed by prior art ruggedized FS, there might be no compatibility with any non-ruggedized FS, in the sense that if a storage device is moved from a system with the ruggedized FS into a system with a non-ruggedized FS or vice versa, the device contents might be interpreted differently, or even not be readable at all. Such compatibility is highly desired as in many cases storage devices which must operate under ruggedized conditions have to be set-up or maintained on factory-level development systems which have no need for ruggedness and consequently do not support it.
c. Prior art ruggedized file systems typically employ their special algorithms all the time. As there is always some penalty in space and performance when using a ruggedized FS, it might happen that an application will sometimes prefer to give up ruggedness for certain operations or for certain periods of time. As such an option is typically not available, an application requiring ruggedness only for a short period of time must pay the penalty all the time.
There is thus a widely recognized need for, and it would be highly advantageous to have, a ruggedized File System that is not limited by requiring special design, unique meta-data structures and special algorithms.
According to the present invention there is provided a ruggedized file system, which provides ruggedness to the file system at the device driver level.
The present invention overcomes all the above listed disadvantages of prior art ruggedized file systems by solving the problem on a different level. Instead of being provided on the file system level, the ruggedness is provided at the device driver level. Thus any operation that the FS wants to become atomic (i.e. the xe2x80x9cwrite new filexe2x80x9d command described above), is executed by the FS in the following manner:
a. Tell the device driver that the current state is a xe2x80x9cfall-backxe2x80x9d state, into which the system should wake up if interrupted prior to completion of the sequence.
b. Do any operations required by the FS, including erasing, writing or overwriting blocks.
c. Tell the device driver that the atomic sequence is complete, and that a new consistent state is hereby defined.
It is, accordingly, the responsibility of the device driver to ensure that either the sequence completes and the FS reaches the target consistent state, or (if power is lost or any other failure occurs in the middle) the FS will wake up in the initial fallback state. Except for activating this xe2x80x9catomicityxe2x80x9d feature as described above, the FS does not have to be aware of any implementation detail of the ruggedness solution. This novel approach enables the improvement on all the disadvantages listed above:
a. Any FS running on top of the ruggedized device driver can utilize its ruggedness capability, regardless of its unique structures and algorithms. There is no longer a need for specially designed ruggedized file systems. The only adaptation that should take place in the file system code is the insertion of the xe2x80x9catomicityxe2x80x9d calls described above, around the sequences of driver calls to be xe2x80x9catomizedxe2x80x9d.
b. More than that, if the adaptation of the FS in use is either impossible or not desired, the ruggedness can even be achieved at the application level by the insertion of the xe2x80x9catomicityxe2x80x9d calls described above around the calls to the FS that are to be xe2x80x9catomizedxe2x80x9d.
c. As will be readily understood from the description of the methods of the present invention, a FS utilizing these methods can be made compatible with the non-ruggedized FS from which it was adapted, so that exchanging storage devices between the two file systems is made possible.
d. The methods of the present invention allow the dynamic activation of the xe2x80x9catomicityxe2x80x9d feature according to changing needs. The ruggedized driver makes it possible for the FS to switch the ruggedness feature on and off, so that it does not have to be used when it is not required, thus avoiding paying the unnecessary storage space and the performance penalties. An FS implementation may add an option to the FS interface, enabling a user application to dynamically switch the feature on and off.