Block devices are computer components, such as disk drives and other mass storage devices, such as flash-memory and RAM-based disks. Traditionally, for a block storage device, the application that is using the storage accesses the device using a “block number”. The device driver then translates this block number into a physical address on the device. This translation process usually involves linearly mapping the block number into the corresponding location on the block storage device. This occurs because Block Devices derive from an older idea: magnetic tape, and ultimately reaching back to voice recording on a wax cylinder, such as early devices made by Thomas Edison. These analog devices were strictly linear, and block devices have historically preserved this idea of linearity, but have also flattened it out into individual tracks or groups of known blocks. Thus, the segmented linear technique ultimately has the effect of playing drop-the-needle, such as on an analog phonographic disk or record, but in a digital manner, providing the capability of something between near and actual random-access, depending upon the specific construction of the block device.
The use of this pseudo-linearity, whether in devices, such as hard disks with their tracks, or flash-memory disks with their concept of erase blocks to establish neutral charge, produces linear reads and writes of frames that are very fast, but in many devices produces random writes that are habitually slow, as well as slow random reads in some devices.
While linearity has been the ideal, it has never been absolute due to imperfections in media. For instance, today's disk drives have algorithms for mapping around bad blocks. Here, one has a separate redundant area set aside to accept contents of specific blocks known to be bad.
Similarly, the mapping process is not consistently linear at the application level. In some applications, a “mapping layer” is introduced. This mapping layer can exist for a number of reasons. For example, logical volume managers can map logical blocks into physical blocks to facilitate storage device management allowing dynamic re-allocation of space. Managers using Redundant Arrays of Inexpensive Disks (“RAID”) technology can map data into redundant patterns allowing continuous operation even in the case of storage device failures. In all of these mapping layer implementations, the mapping is designed to be simple, and as much as possible linear. While RAID devices can intermix blocks across multiple storage devices, the overall mapping is still linear from low to high block number. This linear mapping is a basic paradigm of storage device management.
Another aspect of conventional device mapping solutions is that they are generally static in operation. While some mappings allow for dynamic updating, such as when a disk error is detected and a “bad block” is “grown”, most mappings remain the same for the life of the device. Device re-mapping based on live updates is not a part of any existing block device implementation.
The genesis of the invention at hand results from an inherent problem and weakness in most Block devices: that random writes to these devices are very slow, and that random reads are sometimes very slow as well. For instance, a high-speed disk drive can read and write about 170 4-kilobyte blocks per second in a truly random fashion, but can linearly read or write at a speed approaching 10,000 4-kilobyte blocks per second. Similarly, a device built out of NAND flash memory can linearly read and write at well over 5,000 4-kilobyte blocks per second, and also randomly read at this high speed, but can randomly write 50 to 70 such blocks in a second.
While random-access slowness is not an issue for anything stored in a large format, such as a word processing document, or a picture of some sort, it is a problem if one is randomly accessing many small files or records. This commonly occurs in a database environment, and also occurs in environments, such as Internet Message Access Protocol (IMAP) email service where individual small files, such as individual email messages, are stored in a set of directories.
In the particular case in point, there is a desire to use a NAND flash memory device for the purposes of random access in a database environment. However, while such devices were superb in their read performance of random records, being a good thirty times faster than high speed disk drives, their random write performance was less than half the performance of high speed disks. Also, the limited write life of NAND flash memory, as will be discussed later, created concerns about product durability.
However, there may be other ways that data might be organized if it were convenient and useful. Journaling is a method of recording changes to directories and the sizes and position of files without recording the changed contents of a particular file. In Journaling, these characteristics changes are recorded in the sequential order in which they occur. Transaction logging is similar to journaling except that it is implemented at the application level, and records the actual data contents of the files or records in question as these are recorded. As with Journaling, in the event of system failure, Transaction Logs can be played forward from a known good time and data set, such as a completed file backup, in order to bring the data set right up to the instant before failure actually occurred.
As understood by those skilled in the art, Journaling and especially Transaction Logging are very space-intensive. Both were originally implemented in a non-block device specifically using magnetic tape or other low-cost linear media to record the transactions as they occurred. Over time, both have switched to the use of low-cost block devices, such as disk drives, as these are now cheaper than magnetic tape, and can be viewed, in their native linear order of blocks, as the logical equivalent of a very long tape.
Journaling, and especially Transaction Logging, are being mentioned here as one alternative system of viewing data in a manner that is both new and linear, in that the new copy of the data supersedes the old one if the media is played forward through time, and as an example of the advantages of writing data in an alternative order rather than an order fixed to a specific location. However, it needs to be remembered that both Journaling and Transaction Logging are only operable in a linear fashion from first to last because there exists no mechanism of independently remembering where the current version of every datum is located.