1. Field of the Invention
The present invention relates to disk drives, and more particularly to the use of write-back caching while maintaining data integrity.
2. Description of the Related Art
In most computer systems today, the host computer's interaction with the one or more disk drives of a memory subsystem is often the greatest factor limiting the speed performance of the system. While processors in the host computer continue to operate at ever increasing speeds, improvements in the time required for the host to access a disk drive during a read or write operation (i.e. latency) have not kept pace. Thus, as the speed of available microprocessors in the host computer has increased the latency of I/O operations between the host and the memory subsystem has become the predominant performance issue.
A number of techniques have been employed in disk drives to decrease latency and increase throughput to disk storage. One such technique uses cache memory to store memory operations requested of the disk drive by the host computer. Cache memory is typically volatile random-access memory (RAM) located proximately with the disk drive(s) of the memory subsystem. In a write-through caching scheme, execution of a write command received from the host results in the data and disk target address specified by the command being stored into the write cache in addition to the data being written onto a disk at the target disk address. Read operations are also typically cached in a similar manner, with the data and disk addresses cached as data is read from the disk and supplied to the host. The read cache can be separate from or integrated with the write cache.
A disk drive controller located in the disk drive keeps track of the various disk addresses for which the disk storage media of the disk drive holds accurate data in the write cache. If the host subsequently issues a read operation that requires data to be read from one of the cached disk addresses, the disk drive controller verifies whether accurate data for that location is stored in the cache (a cache hit). For every cache hit, the disk drive can forego an access to the disk media in favor of a much faster access to the cache. Of course, the latency for the read operation is not improved when the data for the corresponding read address is not in the cache (a cache miss). Nor is latency improved for the write operation in a write-through caching scheme, as the write is not acknowledged to the host as completed until the data has been physically committed to the disk media (i.e. stored magnetically on the surface of the disk media).
A more aggressive caching technique is called write-back caching. In this approach, the data and target address for a write operation is received and written to the cache by the disk controller, but the data is not immediately written to the disk media. The cached data is typically written to the disk media at a later time under two circumstances. In one instance, the disk controller detects that the cache is full, in which case data from one or more of the cache memory locations must be written to disk to free up the cache locations for the pending write operations. In the second instance, the memory subsystem receives a CACHE_FLUSH command (typically from the host), in response to which all cached write data not yet written to the disk is flushed from the cache and written to the disk media. The most significant feature of write-back caching is that the disk controller acknowledges the completion of each write operation to the host immediately upon the write data and the target address being stored in the cache. This means that the application program running on the host computer that requested the write operation can continue execution without waiting for the data to actually be committed to the disk media. Because access to RAM is so much faster than an access to the disk (e.g. <1 ms vs. 6-7 ms respectively), forestalling the write to disk and acknowledging completion of the write to the host immediately upon caching the data significantly reduces the application's perceived latency of each write request.
One technique employed in disk drives to increase memory operation throughput of physical writes to (and reads from) the disk involves queuing and reordering the execution of pending disk operations so that those operations accessing addresses on the disk that are more proximate to one another can be executed together. This optimization process serves to minimize the mechanical latency of the accesses. Mechanical latency includes, among other factors, the time required for the read write head to be positioned over the disk location corresponding to the disk address at which the I/O operation is to occur. This includes the seek time necessary to actuate the arm carrying the read/write head to the appropriate track on the disk, as well as the time it takes for the disk to rotate until the right location on the track is under the read/write head. Write operations involving large blocks of data are typically broken down into smaller sub-operations. It is advantageous if these sub-writes can be executed sequentially and continuously with respect to one another, because data will then be stored in adjacent regions called tracks on the surface of the disk. The seek time between adjacent tracks is minimal. It is also clear that even if reads and writes are related, though they may not be sequential, processing these I/O operations based on some proximity algorithm can still minimize latency by minimizing the seek time between them.
The process of optimizing the order of pending disk accesses is generally orthogonal to the caching technique described above. The effectiveness of such optimization techniques may vary depending on the nature of the incoming I/O request stream and the manner by which the optimization scheme works with the particular caching scheme employed by the disk drive. For example, the choice of which write operation to flush from the write cache as a result of a cache miss when the cache is full can be optimized depending upon the nature of the disk operations already pending.
Write-back caching provides greatly reduced I/O latency because the disk drive acknowledges a write operation back to the host (and ultimately the application that requested it) prior to the data being physically committed to the disk media. However, significant problems will arise if something goes wrong prior to the data being stored on the disk media. For example, if power is lost, the data for all of the write operations not yet written to disk may be lost from the volatile cache memory. Or if the disk drive gets hung up, a hard reset is usually required to resume operation. A hard reset involves cycling the power and may also result in loss of data from the cache. Finally, if the write operation is interrupted in mid-write, there is generally no way to easily recover because the disk drive will not be able to tell what has been written to the disk media and what has not. Even if a means for recovering from write errors is provided, by the time such a problem is detected, the application that requested the write operation may have continued and even completed execution and as a result may no longer be active. Additionally, typical systems in used today have no mechanism for informing the application that a write error has occurred, even if the application is still running and has some way to recover from the write error.
For desktop systems, the significantly lower perceived latency achieved through write-back caching far outweighs any downside caused by the foregoing problems. Long accesses to the disk drive are quite noticeable to desktop users running typical desktop application programs. Moreover, these problems occur quite infrequently, and when they do, users usually can recover with only some minor inconvenience. An example of write-back caching in typical desktop PC applications is the Intelligent/Integrated Drive Electronics (IDE) interface between the memory subsystem and the host system. This standard has a simple interface and specifies the use of the write-back caching scheme described above, to achieve low latency.
For enterprise applications such as those storing large and valuable databases accessed over the Internet and other networks, the corruption of data that can occur using the write-back caching scheme of IDE (more properly referred to as AT Attachment (ATA) standard based) drives can be disastrous. The loss or corruption of data in enterprise applications is extremely costly, however infrequent, and as discussed above, exceedingly difficult to track down and correct. The most common interfaces employed between system hosts and memory subsystems for enterprise applications include the Small Computer System Interface Standard (SCSI) and the Fibre Channel standard. Drives built to these standards are intended to provide a high degree of data integrity, albeit at a higher cost; ATA disk drives can be as much as 2 to 3 times cheaper than SCSI drives. SCSI and Fibre Channel disk drives typically provide write-back caching.
Some SCSI drives permit the write-back caching scheme to be disabled in favor of write-through caching. This solves the problem of possible data loss, but the resulting increase in perceived latency may not be acceptable. It would be much more desirable to avoid the loss or corruption of data from the cache and still have the benefit of the reduced latency provided by employing write-back caching. One technique that has been used in an attempt to avoid the problem of data loss from the write cache during loss of power is to employ a write cache that is backed-up by battery power. While the data is preserved in the cache until power is restored, or at least as long as the batteries provide sufficient power, this technique adds cost and complexity to the system. Another known technique employs an uninterruptable power supply (UPS) in an attempt to maintain system power long enough for the cache to be flushed and all of the write requests still in the write cache to be physically completed to disk. The primary problem with this solution is the lack of certainty that the time provided by the UPS will always be sufficient for all of the cached write requests to be completed to the disk media before the power is ultimately lost. The total time required to complete any group of write requests will vary widely as a function of the number of such operations to be flushed, the physical proximity of the disk addresses being written, and whether write errors occur that slow down the completion of the cache flush.
Moreover, given the significant cost advantage of ATA drives over SCSI and Fibre Channel disk drives, it may be advantageous to employ ATA drives in enterprise storage applications. Unlike the SCSI specification, however, the ATA specification until recently did not permit even the option of providing write-back disablement. Prior to this recent change in the ATA specification, manufacturers ran the risk of non-compliance with the specification to offer such an option. Even today, adding such a feature as an option risks backward compatibility with earlier drives designed to older versions of the specification. Moreover, although the most recent ATA specification now grants manufacturers an option to provide a software command by which to disable write-back caching, it is not certain that any ATA drive manufacturer will provide such an option. ATA drives have not been designed to operate in that fashion in the past and are therefore not well characterized in a cache disabled mode of operation. Even if such a disable command is provided so that ATA drives can be more safely used in enterprise applications, the performance advantage of using write-back caching would be sacrificed.
Therefore, there is still room in the art for a method and apparatus by which ATA drives can be adapted to enterprise storage applications in a manner that does not require the drive itself to be physically altered to operate outside of its intended modes of operation, that still makes use of the write cache to improve disk drive performance, while substantially reducing the likelihood that data will be lost or corrupted.