Computers use storage devices such as disk drives for permanently recording data. The computers are typically called “hosts” and the storage devices are called “drives.” A host can be connected to multiple drives, but a drive can also be connected to multiple hosts. Commands and data are transmitted to the drive to initiate operations. The drive responds with formatted status, error codes and data as appropriate. Various standard command architectures have been adopted including, for example, Integrated Drive Electronics (IDE), Small Computer System Interface (SCSI) and Serial ATA (SATA).
The host computer can range in size from a supercomputer cluster to a small handheld device. The host can also be special purpose devices such as a digital camera. Similar data storage devices might be used in a variety of applications including personal computers with less stringent demands, as well as large systems used by banks, insurance companies and government agencies with critical storage requirements. Viewed at a high level a computer is typically described as having an operating system which provides basic services to application programs running on the computer. More detailed views can break the processing into multiple processing layers.
A queue of commands for the disk drive may be kept in the drive's memory. A disk drive can use the command queue to optimize the net execution time of commands by changing the order in which they executed. Among other criteria, prior art algorithms use seek time rotational latency to optimize execution time.
U.S. patent application 2006/0106980 by Kobayashi, et al. (published May 18, 2006) a hard disk drive (storage device) that includes a queue capable of storing a plurality of commands, and a queue manager for optimizing the execution order of the plurality of commands on the basis of whether or not the execution of each command requires access to the storage medium.
A disk drive typically includes a high speed cache memory where selected sectors of data can be stored for fast access. Operations performed using only the drive's cache are much faster than those requiring that the arm be moved to a certain radial position above the rotating disk and having to wait for the disk to rotate into proper position for a sector to be accessed. A read cache the cache contains copies of a subset of data stored on the disk. The cache contains recently read data and may also contain pre-fetched sectors that occur immediately after the last one requested. A read command can be satisfied by retrieving the data from the cache when the needed data happens to be in the cache.
The cache can also be used for data that is in the process of being written to the disk. There is a critical window of time in a write operation between placing the data in the cache and actually writing the data to the disk when a power failure, for example, can cause the data to be lost. However, having the host wait until the relatively slow write process has completed can be an unnecessary inefficiency in many cases. The waiting time is justified for some data but not for all data. A so-called fast write operation simply places the data in the write cache, signals the host that the operation is complete and then writes the data to disk at a subsequent time, which can be chosen using optimization algorithms that take into account all of the pending write commands.
Prior art command architectures have provided ways for a host to send a particular command or parameter to the drive to ensure that the data is written to the disk media before the drive signals that the write operation is complete. Writing data on the media is also called committing the data or writing the data to permanent storage. One type of prior art command (cache-flush) directs the drive to immediately write all of the pending data in the cache to the media, i.e., to flush the cache. Flushing the entire cache on the drive may take a significant amount of time and if done too often, reduces the benefit of the cache. Also known in the prior art is a write command with a forced unit access (FUA) flag or bit set. A write with FUA flag set will cause the drive to completely commit the write to non-volatile storage before indicating back to the host that the write is complete.
Efficiencies can also be obtained by rearranging the order in which the commands are executed, but re-ordering of commands inside the drive can also create problems. There is the potential for such write re-ordering to introduce inconsistency in the data structures on disk. File system and data base consistency is guaranteed by the order in which specific writes are written to non-volatile storage. While it is permissible to reorder some writes a partial ordering of writes must be guaranteed.
Write barrier commands are used to aid application programs in ensuring that certain data is physically on the storage media before other data is written to the device. Data consistency is guaranteed by the order in which certain writes occur to the non-volatile media. The write barrier does not explicitly indicate a time at which the write will occur as in cache-flush and FUA commands. A write barrier imposes a partial ordering on the pending writes to the drive. A write barrier can be defined as a special write command or a selectable option in a write command that ensures that the previous write commands are actually written to the media and not simply sitting in the cache. All write commands sent before a write barrier (WB) command must be committed to the media before the WB-command is committed to the media. Additionally, all writes sent after the WB-command must only be committed to the media after the WB-command is committed to the media.
In United States Patent Application 20060190510 by Gabryjelski, et al. (Aug. 24, 2006) a system is described that facilitates the storage of data using a “write barrier component.” The system interfaces to a hardware component that stores data, and includes a write barrier component that dynamically employs instructions compatible with the hardware component to ensure data integrity during storage of the data. The write barrier component is independent of the operating system and application programs and can operate in a user mode and/or a kernel mode. A coalescing component combines cache synchronization requests into a single set of instructions to flush the disk cache in one process.
Experiments by the applicants have confirmed that the commonly used Microsoft operating system Windows XP makes frequent use of cache flushing commands to ensure that the file system remains in a consistent state. The experiments also show that the frequent cache flushing results in very low utilization of the cache. For example, with a 16 MB write cache during an observation period more than 70% of the cache flushes occurred when the cache was less than 1% full. A means that allows the cache to be used effectively while allowing critical data to be committed to the media is needed.