Computer clusters, or groups of linked computers, have been widely used to improve performance over that provided by a single computer, especially in extended computations, for example, involving simulations of complex physical phenomena. Conventionally, in a computer cluster, computer nodes (also referred to herein as client nodes) are linked by a high speed network which permits the sharing of the computers resources and memory. Data transfers to or from the computer nodes are performed through the high speed network and are managed by additional computer devices, also referred to as File Servers. The File Servers file data from multiple computer nodes and assign a unique location for each computer node in the overall file system. Typically, the data migrates from the File Servers to be stored on rotating media such as, for example, common disk drives arranged in storage disk arrays.
The computer cluster may assume either the compute state (or compute cycle) or the input/output (I/O) state (or I/O cycle), which are typically mutually exclusive. The process of moving data is carried out during the I/O cycle of the computer nodes, e.g., the data transfers are executed during time intervals when the computer activity has ceased. Since during the I/O cycle no actual computations occur, it is important to keep the I/O cycle as short as possible to maximize the overall computer duty cycle of the computer cluster.
Since the File Servers satisfy the requests of the computer devices in the order that the requests are received, the disk drives appear to be accessed randomly as multiple computer devices may require access to the servers at random times. In this scheme, disk drives operate as “push” devices, and store the data on demand. Disk drives do not favor the regime of satisfying random requests since the recording heads have to be moved to various sectors of the drive (aka “seeking”), and this “seeking” movement takes a greater amount of time when compared to actual “write” or “read” operations. To “work around” this problem, a large number of disk drives may be utilized that are accessed by a control system (storage controller) which schedules disk operations in an attempt to spread the random activity over a large number of disk drives to diminish the effects of the disk head movement.
The size of computer clusters and the aggregate I/O bandwidths that is to be supported may require thousands of disk drives for servicing the computing architecture in order to minimize the duration of the I/O cycle. The I/O activity itself occupies only a short period of the overall “active” time of the disk system. Even though the duty cycle of write activity may occupy only a portion of the clusters total operational time, all the disk drives nevertheless are powered in expectation of the I/O activity.
Therefore, it is beneficial to provide a data migrating technique between the computing cluster architectures and the disk drives which provides the RAID protection for the data randomly received from the computing cluster architectures complimented with a mechanism for adjusting RAID striping to the number of available “healthy” components, and which attains a shortened I/O cycle of the high performance computer clusters as well as an effective aggregate I/O bandwidths of the disk drives operation provided with a reduced number of disk drives activated for data storage, without excessive power consumption, and simultaneously maintaining data reliability as well as data integrity.
In order to increase data reliability and performance of storage systems, the Redundant Array of Independent Disks (RAID) technology has been customarily used in the industry that provides increased storage reliability through redundancy by combining multiple disk drive components into a logical unit where all drives in the storage disk array are independent. RAID protection contemplates the striping technique to provide greater protection from data loss. In some modifications of the RAID systems, data D is interleaved in stripe units distributed with parity information P across all of the disk drives.
For example, in the RAID 5 scheme, which uses block-level striping with distributed parity, the system distributes parity P along with the data D and requires all drives but one to be present for operation. Drive failure requires replacement, but the array is not destroyed by a single drive failure. Upon drive failure, any subsequent “reads” may be calculated from the distributed parity such that the drive failure is masked from the end user. The array may lose data in the event of a second drive failure and is vulnerable until the data that was on the failed drive is rebuilt onto a replacement drive. A single drive failure in the set will result in reduced performance of the entire set until the failed drive has been replaced and rebuilt.
The RAID 6 scheme uses the block-level striping with double distributed parity P1+P2, and thus provides fault tolerance from two drive failures. The array continues to operate with up to two failed drives. This makes larger RAID groups more practical, especially for high-availability systems.
The data reliability provided by the RAID striping depends heavily on the availability of “healthy” storage components capable of supporting the stripe size, and, in the case of an insufficient number of reliable storage components for the striping, system performance may suffer.