1. Field of the Invention
The present invention relates generally to techniques for storing data, and more particularly to systems and methods for accelerating writes to a storage controller by performing log-based sequential write caching of data to be written on a storage device.
2. Description of the Related Art
There has been an increasing demand for access to high performance and fault-tolerant data storage to keep pace with advances in computing infrastructures. While the cost of storage devices such as hard disk drives (HDDs) have been plummeting due to manufacturing improvements, the cost of managing storage has risen steadily. Storage management has become critical to many enterprises that rely on online access to operational and historic data in their day-to-day business operations.
However, HDDs are prone to failure of their electromechanical components. Hence, storage systems that include many HDDs need to have redundancy built into them, to avoid data loss when an HDD fails. One popular technique for avoiding the loss of data when a HDD fails is known as Redundant Array of Independent Disks (RAID), which is a class of algorithms that store data redundantly on an array of HDDs. Since RAID algorithms add redundancy to user data and decide data layout on the HDDs, they are executed on dedicated hardware in order to free the host processor-memory complex from the task of executing these algorithms.
The hardware executing RAID algorithms typically includes a dedicated processor and memory, as well as Application Specific Integrated Circuits (ASICs), which perform Exclusive OR (XOR) parity calculations, protocol processing, etc. A host machine communicates with this hardware either through the system bus (in which case the hardware is called a RAID adapter) or via a storage interconnect like Small Computer System Interface (SCSI) (in which case the hardware is called a RAID controller). HDDs connected to the controller are mapped to logical drives that are created via configuration commands sent to the controller by an application. A logical drive is a storage extent that is externalized by the controller to its host and resembles and extent on a HDD. The RAID controller, depending on the RAID level chosen for a logical drive, decides the location and the need to update redundant data.
There are a number of different RAID algorithms, the more popular including RAID-0, RAID-1 and RAID-5. All RAID algorithms employ data striping, which interleaves bytes across multiple drives so that more than one disk can read and write simultaneously. RAID-0 logical drives have data striped across a set of HDDs, called an array. A RAID-0 drive has very good read and write performance, since it attempts to parallelize accesses across all HDDs. However, since there is no data redundancy, a failure of any HDD can lead to data loss. In RAID-1 logical drives every chunk of data is mirrored on two separate HDDs. The presence of redundant data allows the controller to recover user data even when a single HDD fails. While the read performance of a RAID-1 drive is very good, the write performance suffers since every update needs to be propagated to its mirror location too. Further, the high level of data redundancy leads to low capacity utilization.
In an effort to balance capacity utilization and performance, RAID-5-logical drives protect a set of chunks of data to be stored on independent HDDs by computing and storing parity information for that set on a separate HDD. Parity information is derived by calculating the data in two different drives and storing the rest on a third drive. The location of the parity information is distributed across the array to balance the load.
One example of a RAID-5 configuration is shown in FIG. 1. A set of chunks of data, comprising ABCDEF, is striped across three different hard drives 10, 12 and 14. When one HDD fails, the RAID-5 logical drive can reconstruct the chunk lost using the remaining chunks. While a RAID-5 drive makes efficient use of the array capacity, it suffers from the performance overhead of having to read, compute, and update parity on every write. Some optimizations are possible on large writes, but when the workload is dominated by small random writes, the performance of a RAID-5 drive suffers.
Two advances have made the RAID-5 organization popular; (1) the presence of write caches in the controller (deferring the delays due to parity updates to a later time, and (2) hardware assist for parity computation via ASICs. While these two innovations have significantly boosted RAID-5 logical drive performance, they do not eliminate the additional work that must be done to maintain the parity in synchrony on any update. Workloads that tend to be small-sized, write dominated, and bursty expose limitations of such improvements for RAID-5 arrays. As the cache size has increased, (servers with 2-8 GB are not uncommon), the I/O traffic generated to the controller resembles such workloads. Since the caches upstream are so large most of the uncommitted working data is kept in them for as long as necessary. When dirty data is flushed to the controller, it is seldom re-used within a short period of time; hence, there is seldom a need to move the same data in and out of the controller. Cache flushes generated by the OS/database kernel tend to be bursty (when pages must be evicted quickly to make room for new data) and random. Thus, being able to handle bursty traffic efficiently becomes highly desirable to end-users. Another consequence of large caches upstream is that there is high variance in the workloads to the controller resulting in periods of intense load followed by light or idle load.
The weaker performance of RAID-5 drives under small-to-medium sized, write dominated and bursty workloads is a consequence of the additional work that needs to be done to update parity information. Under such a workload, each RAID-5 write generates three additional I/Os and at least two additional cache pages. This is because both the data and the associated parity must be computed and updated on HDDs. Once the cache fills up, the controller becomes limited by the flush rate, which suffers from a high overhead due to parity update. When the workload has long bursts of random write traffic, the write cache can provide good response times initially but over time the response time deteriorates as the cache fills up.
In view of these considerations, a system and method is needed to improve the performance of a storage controllers, such as HDD controllers using RAID-5 logical drives, under a workload that is dominated by small-to-medium sized random writes.
The present invention has carefully considered the above problems and has provided the solution set forth herein.
A system and computer-implemented method for accelerating writes to a storage controller by performing log-based sequential write caching of data to be written on a storage device. The data in the log is moved to the storage array at a later time when the system is less active. As a result, random writes are converted to sequential writes. Overall, performance improves since the performance of sequential writes far exceeds that of random writes.
In one aspect of the invention, a method for storing information on a data storage medium that includes the following steps. A write command containing data is received in a data storage controller, wherein the data storage controller includes a write cache having a sequential log. The data storage controller also includes an index structure indicating the location of data in the sequential log. If the data does not already exist on the log, the data is written to the sequential log at a location recorded in the index structure. If the data already exists on the log as indicated by the index structure, the data on the index structure is invalidated and the new data is written on the log at an available location determined by the index structure. When an idle period exists, data in the log from a plurality of write commands is transferred to the data storage medium.
In another aspect of the invention, a system for storing information on a data storage medium includes a data controller that includes a write cache having a sequential log wherein the data controller also includes an index structure indicating the location of data in the sequential log. In addition, the system includes a means for receiving a write command containing data and a means for determining if the data already exists on the log. If the data does not already exist, the data in the write command is written on the log at a location recorded in the index structure. Furthermore, there is a means for invalidating the data on the index structure and writing the new data on the log at an available location determined by the index structure, if the data already exists on the log as indicated by the index structure. A means for determining if a migration condition exists is also included, wherein, if the migration condition exists, the data in the log from a plurality of write commands is transferred to the data storage medium.
The details of the present invention, both as to its structure and operation, can best be understood in reference to the accompanying drawings, in which like reference numerals refer to like parts, and in which: