Data represents a significant asset for many entities. Consequently, data loss, whether accidental or caused by malicious activity, can be costly in terms of wasted manpower, loss of goodwill from customers, loss of time and potential legal liability. To ensure proper protection of data for business, legal or other purposes, many entities may desire to protect their data using a variety of techniques, including data storage, redundancy, security, etc. These techniques may, however, conflict with other competing constraints or demands imposed by the state or configuration of computing devices used to process or store this data.
One method for dealing with these tensions is to implement a Redundant Array of Independent Disks (RAID). Generally, RAID systems divide and replicate data across multiple hard disk drives (or other types of storage media), collectively referred to as an array, to increase reliability and in some cases improve throughput of computing devices (known as a host) using these RAID systems for storage. To a host then, a RAID array may appear as one or more monolithic storage areas. When a host desires to communicate (read, write, etc.) with the RAID system the host communicates as if the RAID array were a single disk. The RAID system, in turn, processes these communications to implement a certain RAID level in conjunction with such communications. These RAID levels may be designed to achieve some desired balance between a variety of tradeoffs such as reliability, capacity, speed, etc. For example, RAID (level) 0 distributes data across several disks in a way which gives improved speed and utilizes substantially the full capacity of the disks, but all data on a disk will be lost if the disk fails; RAID (level) 1 uses two (or more) disks which each store the same data, so that data is not lost so long as one disk survives. Total capacity of the array is substantially the capacity of a single disk and RAID (level) 5 combines three or more disks in a way that protects data against loss of any one disk; the storage capacity of the array is reduced by one disk.
Current implementations of RAID may have a variety of problems. These problems may stem from limitations imposed by the architecture of these RAID systems, such as the fact that in many instances all communications with a RAID system must be addressed to a single server which controls and manages the RAID system. Other problems may arise from the configuration or layout of the data on the disks comprising a RAID system. For example, in certain cases a RAID level must be chosen and storage allocated within the RAID system before the RAID system can be utilized. Thus, the initially chosen RAID level must be implemented in conjunction with the data stored on the RAID system, irrespective of whether that level of RAID is desired or needed. In many cases these existing problems may be exacerbated by the need to use custom hardware or software to implement these solutions, raising the costs associated with implementing such a solution.
Additionally, in RAID systems or other storage systems which present storage to a host or other device multiple issues may delay the utilization of the storage system or particular storage devices within the storage system. More specifically, setting up such a storage system may require mirroring data on storage devices or the calculation of parities or other redundancy data before the storage system may be used by one or more hosts. This is the result of the fact that individual sectors in a disk which has not yet been initialized or which is being reused or overwritten may contain random bits or other data affecting the calculation of redundancy data. Thus, it may be necessary to run operations based on data in the individual sectors comprising the disks or other storage devices before they may be utilized to store data. This is especially true for storage systems such as RAID systems or other systems which mirror data on storage devices or calculate redundancy data based on the data in the sectors of the storage device(s).
In RAID systems for example, until redundancy data corresponding to the data stored on the storage devices comprising the RAID system is calculated it may not be possible to rebuild or recreate stored data in the event of a failure. Thus, even before utilizing a storage device for the storage of data, in order to be able to recreate any stored data that is subsequently stored, the redundancy data corresponding to the current value in the storage devices may need to be calculated and stored, even though the values in those storage devices may be garbage values.
This is problematic, as calculating redundancy data when setting up storage systems may be a lengthy process. In general, the total number of sectors on storage devices comprising a storage system may be large and a great deal of time may be required to mirror the data or calculate parity or other redundancy data. In some storage systems, such calculations based on the data at sectors in the storage devices may take upwards of many hours, increasing the time required to set up operational storage systems and inconveniencing users of such systems.
Consequently, it is desired to substantially ameliorate these problems.