1. Field of the Invention
The present invention pertains to storage systems, and more particularly, to optimizing data write operations.
2. Background Information
A storage server is a processing system adapted to store and retrieve data on behalf of one or more client processing systems (“clients”) in response to input/output (I/O) client requests. A storage server can be used for many different purposes, such as to provide multiple users with access to shared data or to backup data.
One example of a storage server is a file server. A file server operates on behalf of one or more clients to store and manage shared files in a set of mass storage devices, such as magnetic or optical storage disks or tapes. The mass storage devices may be organized into one or more volumes of Redundant Array of Independent (or Inexpensive) Disks (RAID). Another example of a storage server is a device, which provides clients with block-level access to stored data, rather than file-level access, or a device, which provides clients with both file-level access and block-level access.
In a storage server, data gets corrupted or lost from time to time, for example, upon the failure of one of the mass storage devices. Consequently, virtually all modern storage servers implement techniques for protecting the stored data. Currently, these techniques involve calculating a data protection value (e.g., parity) and storing the parity in various locations. Parity may be computed as an exclusive-OR (XOR) of data blocks in a stripe spread across multiple disks in a disk array. In a single parity scheme, e.g. RAID-4 or RAID-5, an error can be corrected in any block in the stripe using a single parity block (also called “row parity”). In a dual parity scheme, e.g. RAID Double Parity (RAID-DP), a technique invented by Network Appliance Inc. of Sunnyvale, Calif., errors resulting from a two-disk failure can be corrected using two parity blocks. The first one is a row parity block, which is computed as a result of XOR of data blocks in a stripe. The second one is diagonal parity, which may be computed as an exclusive OR operation (XOR) of data blocks in a diagonal set.
Although the parity protection schemes described above provide data protection, to modify a data block on a disk and thus to compute new parity, multiple read and write operations need to be performed. For example, to modify a data block(s) under one RAID-5 scheme, a parity block is read. Data from data blocks that will be modified are also read. Then an exclusive OR (XOR) operation is performed on the parity block and the data blocks. To compute new parity, the result of the XOR of the previous step is XOR'ed with the new data. The new data and the new parity are written to the disk. Thus, two read operations (one of the parity block and one of the data blocks) and two writes (one of the new data and one of the new parity) are required. This process is sometimes referred to as “Read Modify Write” or “Parity by Subtraction Write.” In some systems, performing a preliminary read operation requires the system to wait for the storage devices (e.g. disks) to rotate back to a previous position before performing the write operation. Thus, performing multiple read operations to modify data blocks results in rotation latency, which impacts overall system performance.
There are known solutions that attempt to eliminate selected read operations. According to one known solution, an entire stripe has to be written, including the new parity. This technique is referred to as a “Full Stripe Write.” However, as a file system ages, its ability to do full stripe writes decreases.
Another solution, which eliminates selected read operations, stores in memory cache data from the data blocks where the new data is to be written. Since the old data are currently in cache, the old data do not need to be read prior to writing the new data. This solution, however, requires a significant amount of memory cache and still may not be very effective.
Accordingly, what is needed is a method and system that optimizes I/O operations so as to eliminate additional latency associated with performing multiple read operations.