Modern data storage systems frequently employ hundreds or even thousands of HDDs (Hard-Disk Drives) and SSDs (Solid-State Drives) interconnected by high-speed busses such as Serial Attached SCSI (SAS). To improve both the reliability and performance of these components they are often grouped together into RAID (Redundant Array of Inexpensive Disks) configurations. RAID improves both reliability and performance by spreading data across multiple disks using a method known as “striping.” Disk striping divides a set of data (e.g., file, folder, partition, etc.) and spreads the data blocks across multiple storage devices so that each stripe consists of the data divided across a set of hard disks or SSDs. A “stripe unit” refers to that portion of a stripe that resides on an individual drive; for example a stripe spanning 14 drives consists of 14 stripe-units, one per drive. The number of different drives depends on the configuration of the storage system, and the requirements of the applications. For example, in a Data Domain OS storage system (DDOS), such as that provided by EMC Corporation, the backup server can write to upwards of 14 RAID disks at a time.
Disk striping confers certain significant performance benefits to data storage systems. For example, an individual HDD might be able to sustain a sequential data transfer rate on the order of 100 MB/sec, but this can be multiplied to 1400 MB/Sec if transfers are conducted in parallel across a 14 disk array concurrently. However, the performance gains are highly dependent on optimized order and batching of the disk I/O (input/output) operations. For example, though a DDOS system may write to 14 disks at a time, these write requests often get shuffled with other write requests before they get transmitted out over the SCSI/SAS fabric, and this increases the amount of time a single full-stripe write must take. HDDs are even more performance limited when it comes to random operations because a recording head must be physically moved across a rotating disk, limiting such operations to often less than 20 per second. To optimize random operations it is important to allow those transfers to proceed free from the restrictions of striping and therefore independently on all the drives in the array. In this way a 14 drive RAID could support 280 concurrent random operations per-second. Examples have been described with respect to a specific implementation of a 14 drive DDOS system, but is should be noted here and throughout the description, that examples and embodiments apply to any other data storage system having a plurality of storage devices that stores data across a device array.
Modern file and database systems take advantage of RAID's dual-strengths by batching their writes and performing them sequentially (to take advantage of disk striping) while allowing random read operations to occur concurrently to each drive separately. As advanced as these methods are, the lower-level OS software that communicates with the disks themselves is ill-suited to handle both of these patterns simultaneously. This shortfall is further compounded by the fact that current OS software does not adequately take into account the unique performance characteristics that SAS bus topologies present when scaled up to address hundreds or thousands of disks.
In present RAID backup systems using disk striping, full-stripe writes typically do not yield their promised bandwidth increases, as performance gains typically drop off long before the available bandwidth is saturated. One issue with RAID is that full-stripe transfers can go no faster than the slowest disk. However, it is significant that much of the observed differences in drive performance are actually due to the way paths to these devices are shared, and not due to the disks themselves. Furthermore, it is apparent that current software scheduling algorithms that dispatch these requests to the drives do not handle these disparities properly. For these reasons, physical differences as small as 4% between disk drives has been observed to produce performance variations of over 2000% when measured at the application layer.
What is needed, therefore, is a system and method for keeping groups of writes to separate disks together from initiation in the RAID layer for transmission as a group over the SAS fabric to decrease the amount of time individual stripe writes take. Such a solution would improve data storage performance by taking both RAID and SAS bus topology considerations into account.
The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also be inventions. EMC, Data Domain, Data Domain Restorer, and Data Domain Boost are trademarks of EMC Corporation of Hopkinton, Mass.