Redundant Array of Inexpensive Disk (RAID) systems have become the predominant form of mass storage systems in most computer systems today that are used in applications that require high performance, large amounts of storage, and/or high data availability, such as transaction processing, banking, medical applications, database servers, internet servers, mail servers, scientific computing, and a host of other applications. A RAID controller controls a group of multiple physical disks in such a manner as to present a single logical disk (or multiple logical disks) to a computer operating system. RAID controllers employ the techniques of data striping and data redundancy to increase performance and data availability.
Although the techniques of RAID can provide marked performance improvements, the underlying performance of transfers with the physical disks themselves is still crucial to the overall performance of an array of disks. A disk drive includes one or more platters that store the data. The platters are laid out in tracks, which are each subdivided into sectors, or blocks. The platters are spun at high speed by a motor. The disk drive also includes a head for each platter that is capable of reading and writing data. The head is at the end of an arm that swings across the platter until it reaches the desired track. Disk performance is determined predominantly by four components: command overhead, seek time, rotational latency, and data transfer time. The command overhead is the time required for the RAID controller to transmit the command to the disk and for the disk to transmit a completion to the RAID controller. The command overhead also includes the time required by the disk to process the command. The seek time is the time required for the arm to swing to the desired track. The rotational latency is the time required for the disk platters to spin around such that the disk block, or sector, to be read or written on the desired track is under the head, which is predominantly a function of the rotational speed of the platters. The data transfer time is the time required to transfer the data to or from the disk, which is predominantly a function of the amount of data to be transferred and the data transfer rate between the disk and RAID controller and may also be a function of the data transfer rate to/from the disk recording media.
Streams of disk commands from a RAID controller to a disk, at least for relatively short periods, may typically be characterized as sequential in nature or random in nature. Random command streams are characterized by commands that specify disk blocks that are not sequential in nature. That is, the disk blocks do not lie on the same track thereby requiring some seek time; or, if on the same track, they are not adjacent to one another on the track thereby requiring some rotation time. Random commands also tend to specify relatively small amounts of data. Examples of systems that might receive highly random command streams are file servers, mail servers, transactional database servers, and the like. Thus, the performance of random command streams is dominated by seek time and/or rotational latency. The RAID notion of striping across multiple physical disks can significantly reduce the seek time penalty by enabling multiple heads on multiple physical disks to seek in parallel.
Sequential command streams are characterized by commands that specify disk blocks that are relatively adjacent and require little seeking. An example of an application program that might generate highly sequential read command streams are streaming video or audio applications in which enormous amounts of data are sequentially stored onto the disks and which are read in sequential order from the disks. The performance of highly sequential read command streams is largely dominated by data transfer time, as long as the disk includes a reasonable read cache. Conversely, a data acquisition system that captures large amounts of real time data, for example, might generate a highly sequential write command stream.
The performance of highly sequential write command streams was historically dominated by rotational latency as explained by the following example. First, the controller writes a command worth of data to the disk, which the disk writes to the media and then returns completion status to the controller. Next, the controller writes another command worth of data to the disk which is sequential with, or adjacent to, the data of the previous command. However, because the disk platter rotated some during the command overhead time, the target sector on the platter, which is adjacent to the last sector of the previous write command, has passed the location of the write head. Hence, a rotation time, or at least a portion of a rotation time, must be incurred to wait for the media to rotate such that the specified sector is under the head. The incurred rotation time may greatly reduce write performance. In particular, the severity of the impact on performance is largely a function of the length of the data transfer. That is, if only a relatively small of amount of data is written each rotation time, then the aggregate data transfer rate will be substantially worse than if a relatively large amount of data is written each rotation time.
One innovation that has improved sequential write performance is what is commonly referred to as command queuing. A command queuing disk is capable of receiving multiple outstanding commands. Thus, the command overhead may be largely, if not completely hidden. Using the example above, the controller may issue the second write command and transfer the data for the second write command to the disk while the disk is writing the data for the first write command to the media. Thus, once the disk has written the first data it may immediately begin writing the second data, which is sequential to the first data, while the write head is still in the desired location, since the second data has already been written to the disk by the controller. Hence, typically in a command queuing situation, as long as the controller can provide data to the drive at the disk media rate, no rotational latencies will be incurred, and the stream of data may be transferred at effectively the disk media rate.
However, some disks do not support command queuing. In particular, it has been noted by the present inventor that a significant number of SATA and SAS disks do not currently support command queuing, resulting in poor sequential write performance. These disks are particularly desirable in RAID controller applications because they are relatively inexpensive and because they have a serial interface, which is an advantage in environments in which disks are densely packed, such as in many RAID environments.
One way to reduce the negative impact on performance caused by this problem is to increase the RAID stripe size so that the length of a typical write command to a drive is relatively large. However, increasing the stripe size also has the detrimental affect of potentially drastically reducing performance during predominantly random command streams, since the large stripe size typically offsets the beneficial effect striping has on hiding seek time.
One technique that has been employed by device drivers in host computers to improve sequential write performance is command coalescing. The device drivers take multiple write requests that are sequential, or adjacent, to one another, and combine them into a single write request. By doing this, the device drivers effectively increase the average write request size, which as described above, ameliorates the negative effect of incurring a rotational latency in highly sequential write command streams. An example of a device driver in an operating system that coalesces sequential operations is described in U.S. Pat. No. 5,522,054.
However, a device driver in an operating system accessing logical disks controlled by a RAID controller only has visibility to user data and logical disks, but does not have visibility to coalesce writes of redundancy data (i.e., parity data or mirror data) to redundant drives (i.e., parity drives or mirror drives). Thus, command coalescing at the device driver level cannot address the performance problems described above. Therefore what is needed is a RAID controller that performs command coalescing.