The present invention relates to file systems and, more particularly, to a file system that manages files according to the respective content of the files.
Almost all computer systems, whether large mainframes or tiny embedded micro controllers, need to store data such that the data shall not be lost when the system is powered down. Therefore those computers usually include some kind of Non Volatile Memory (NVM), in addition to any volatile memory they may use for running their programs. The NVM may be a magnetic disk, a flash memory chip, or any other nonvolatile storage element.
FIG. 1 shows the general structure of how such a storage device is accessed. At the bottom of the figure is a physical storage medium 10, which is the hardware layer implementing the physical storage. As each storage device may have its own unique interface and peculiarities which make it very inconvenient to work with, it is the common practice to have a software device driver 12 included in the operating system running on the computer (or running on the bare hardware, if no operating system is used), with this device driver 12 providing a simplified and standardized interface for other software components wishing to access the device. For storage devices 10 used for storing files (i.e. disks, diskettes, etc.), but not only for them, the interface provided by their device drivers 12 is usually of the type known as “block device driver”. Such device drivers 12 interact with their clients using blocks of data rather than single bytes. This applies to both input and output operations, that is, to both reading and writing. The most common example of a block device 10 is the magnetic disk, whose hardware interface is commonly configured for transferring only complete blocks (usually called “sectors” in this context), such as 512 bytes or more. It should be emphasized that it is not necessary for physical storage device 10 to be physically limited to block operations in order to have a device driver 12 presenting a block device interface. For example, a battery-backed RAM disk is not physically limited to blocks and may physically read and write each of its memory bytes. Still, typically its device driver 12 presents a block device interface to the rest of the system, so as to be compatible and interchangeable with magnetic disks.
Therefore, for the purpose of the present invention, a block device is any device whose driver 12 presents a block device interface, regardless of its actual physical structure.
A block device seems to its users as a linear array of blocks of a certain fixed size. Each one of these blocks can be read or written independently of the other blocks using its index in the array, as shown in FIG. 2. The common practice (which is also used here) is to number the blocks starting from block number 0 (21), and ending in block number (N−1) 22, where N is the number of blocks exported by device driver 12. Again it should be emphasized that this linear array structure does not necessarily exist at the physical device level. For example, a flash disk block device driver 12 also presents this linear array image, but internally the physical blocks on a flash medium 10 are usually scattered in a random order (such that block number 0 may physically be located in the middle or the end of flash medium 10) due to the writing limitations in flash memory and the possible existence of bad blocks. It should also be understood that block device driver 12 has no knowledge of the contents put into its blocks by the upper software layers 14 and 16.
Referring again to FIG. 1, it is seen that there is usually a File System (FS) software layer 14 on top of device driver 12. A FS 14 is a software component which provides further insulation from physical device 10, by enabling the application programs 16 to interact with storage device 10 using only the concept of files, a concept which is much more natural and convenient to the typical programmer or user. FS 14 achieves this abstraction by organizing the user data on block device 10 into some logical structure, and associating the blocks containing a file's data with the file's attributes (i.e. file name, creation time, access permissions, etc.). For that purpose FS 14 stores into device 10 meta-data, which are not directly visible to the user, and which include the FS 14 internal book-keeping information with which FS 14 is able to trace and access the user files. For example, the Microsoft DOS FAT12 file system, which is one of the simplest FS commercially available, stores on storage device 10 a boot sector containing some basic parameters, allowing the location of the other meta-data structures (which must be in the first block of device 10), one or more copies of the File Allocation Table (FAT), which is the allocation map of device 10, and a root directory structure for locating files by name. Application programs 16 interact with FS 14 on the file-level, by issuing commands such as “open file”, “delete file”, “write file”, etc. Application programs 16 thus are completely ignorant of the underlying block structure. There are many file systems 14 in use today, greatly differing in their internal structures and characteristics. In many cases (such as with the Linux operating system) an operating system even provides several file systems 14 to its users and they may choose the one most suitable for their needs.
Exactly as a storage device driver 12 uses no knowledge about the content and use of the data stored in the sectors it is handling and all such sectors are treated the same, so also every prior art file system 14 uses no knowledge about the content and use of the data stored in the files it is handling and all such files are treated the same. It should be noted that, unlike a driver 12, a file system 14 does have access to some knowledge about the data. For example, it is the common practice that files containing compressed still digital pictures (such as those generated by still digital cameras) have an extension to their names identifying the fact they contain pictures and also the type of compression used. For example one such file can be called “My_Son.jpg”, indicating to the users this a “JPEG”-type picture file. Another file might be called “Agreement.txt”, indicating it contains text generated by some word processing program. As file system 14 “knows” the names of the files it is handling, it thus in many cases also “knows” what are the types of those files. However, as noted above, no prior art file system 14 makes any use of such knowledge.
It next will be explained why there is a benefit for file system 14 to make use of such knowledge. One must understand that the algorithms employed by a file system when handling a file (i.e. writing or updating the file) involve trade-offs between several desirable characteristics, and the designer of file system 14 must make choices here.
As a first example consider the trade-off between performance and ruggedness. When writing new files or when updating existing files with new data, it is highly desirable that the operation be completed as quickly as possible. This is especially important in real-time systems where the operation must be completed before additional events take place. However, it is also usually desirable that the system be resistant to sudden power loss in the sense that data written prior to the power loss will not be lost. See for example U.S. Pat. No. 6,668,336 which discusses these issues at length. It is a well-known fact that performance and ruggedness are contradictory requirements, and one can be improved at the cost of the other For example, in a Microsoft FAT file system (like the one used by the DOS operating system and by many Microsoft Windows operating systems) when a file is updated by extending it, the length of the file is recorded in the filets directory entry only at the end of the process, when the file is closed by the user (indicating there is no more data to write). This decision of the FAT file system designer is understandable when one considers that the alternative would be to update the filets directory with the current length whenever the space allocated to the file is increased. In a large file of a few megabytes this could mean thousands of directory update operations, a load that would certainly impact performance. However, by giving up those directory updates the designer gave up a bit in the ruggedness of the file system. For if now the power is lost in the middle of writing a long file but after already writing a lot of data, the directory entry of the file will show it to be much shorter, and eventually (after running file system recovery tools such as Microsoft's ScanDisk) the file will be truncated to the length recorded in the directory, with all the data written beyond this length being lost forever. (See also U.S. patent application Ser. No. 10/397,378, filed Mar. 3, 2003, which suggests a method that, for this specific conflict between performance and ruggedness, provides both. However, this case is presented only as an example of the trade-offs involved, and anyhow most file systems do not employ the methods of U.S. Ser. No. 10/397,398).
As a second example consider the trade-off between average performance and maximum latency. A software application 16 might require the recording into storage of a long stream of incoming data. This can be the case in audio-streaming or video-streaming applications. In such case a certain packet of data (say 10 Kbytes) is received per each fixed time slot (say 10 milliseconds), and the cycle of receiving and storing is repeated continuously many times with no break between the packets. It is obvious that file system 14 must be capable of writing data at an average rate of at least 1 Megabyte per second, or otherwise the system will not be able to keep with the flow of incoming data. However, this is not always enough—there might also be a requirement that the handling of one packet must be completed before the arrival of the next. The time for a call to file system 14 to complete is called “latency”, and so the above requirement can be stated as “the maximum latency for writing any 10 KB packet must be less than 10 milliseconds”. In such case there might be a trade-off between the requirements of average performance and maximum latency.
To see why this might be the case, consider, for example, flash memory storage systems 10. Such systems 10 require old and superseded data blocks to be recycled by erasing them prior to having them ready for another use. The erasing operation is relatively slow—a few milliseconds for NAND flash and a few hundreds of milliseconds for NOR flash. It is easy to realize there might be conflicting considerations in deciding when and how to do such block recycling—grouping many old blocks to be recycled together may provide more efficient flash management and therefore better average performance, but on the other hand will create a “pause” in the system's response and might violate the maximum latency requirement.
Having understood that there are such design trade-offs in file systems 14, one should understand why different types of files are better off with the trade-offs decided in different ways. For example, a cellular smart phone supporting both video streaming over the air and word processing capabilities has different needs for the files created by those two applications. For the video files the average write performance is of no real importance (as long as it is not terribly slow)—the average data rate is anyway limited by the bandwidth of the cellular channel which is usually much lower than the performance of the file system. However, the maximum latency requirement cannot be compromised for such application—a packet that cannot be accepted by file system 14 on its arrival (for example because file system 14 is doing a 100 millisecond recycling operation) might be lost forever. For a word processing file on the other hand, write performance is important. A user having to wait for a few seconds when saving a large document file might get frustrated.
Another example can be given for ruggedness. A device designer may decide to adopt a policy that forces the user to explicitly close a file in order to guarantee it is fully and safely saved. Until the user explicitly indicates s/he had finished updating the file, the new file data may reside in a RAM buffer and not be safe from a power loss. However, when downloading an upgrade for the operating system of the device, such a policy can be disastrous, rendering the device totally unusable if a power loss unexpectedly occurs. So when deciding on the performance vs. ruggedness trade-off, it would be beneficial to have executable files treated with one policy and data files with another policy.
Designers of prior art file systems 14 were aware of the considerations discussed above. For this reason one can find prior art file systems 14 that were optimized for certain applications and where the trade-offs were decided accordingly. For example, some file systems 14 are designed for hand-held portable devices such as PDAs or smart phones, where running out of battery power is a very plausible risk. In such cases ruggedness was given high importance over performance—for example RAM buffering of written data is not used, so as not to risk losing the data if the battery runs out of power. In all prior art file systems 14 such policy decisions apply to each and every file, regardless of its type. This is so even though for some file types better and more optimal decisions could be taken. For example, a video streaming file downloaded from an Internet website could greatly benefit from RAM buffering in terms of being able to accept a higher incoming data rate, while the risk of losing the file on power loss is unimportant because it can always be downloaded again from the same source.
There is thus a widely recognized need for, and it would be highly advantageous to have, a file system 14 that manages files in accordance with the contents of the files.