A significant job of a file system, operating system or other storage manager is to place data on a storage medium, such as a disk storage device. Where the data is written (placed on the disk) and when and how it is accessed, can have a significant effect on the read/write performance.
Another significant job is protecting the data from loss in the event of physical damage to the storage medium (fault tolerance). RAID, an acronym for Redundant Array of Independent Disks, is an umbrella term for various data storage schemes that divide and replicate data among multiple physical drives, so that if one (or possibly more) drive(s) is damaged, the data on those lost drives can be recovered. Each scheme provides a different balance between the two primary goals: increased data reliability and increased input/output (I/O) performance.
Erasure coding is a collection of error correction algorithms that enable recovery of data lost on a failed drive in a storage system based on multiple disk drives (e.g., of a RAID array). The general process for generating and writing erasure coded data to storage comprises:                1. data arrives in a series of blocks;        2. each block is broken into sub-blocks;        3. the erasure coding algorithm is applied to the group of sub-blocks;        4. the result is a larger number of sub-blocks as determined by the specific algorithm used (e.g., to include parity data);        5. the resulting sub-blocks are written out in groups of one or more sub-blocks as determined by the specific algorithm used, to the storage media, one group per device (e.g., disk drive).        
The recovery process (i.e., recovery of the data that has been lost on a failed disk drive) then proceeds as follows:                1. read the remaining groups of sub-blocks from the other (non-failed) devices;        2. apply the recovery algorithm to the remaining sub-blocks to generate the lost data;        3. return the original complete data block.        
The above process descriptions are generic and apply to many different erasure coding algorithms. Each coding algorithms has its own set of trade-offs regarding:                1. I/O performance;        2. CPU utilization;        3. storage efficiency;        4. number of drive failures tolerated.        
According to current industry standards, the data size, the erasure coding algorithm, and the array of disk drives are tied together as one integral whole, such that once a drive grouping configuration is established for the data and algorithm, the erasure coding algorithm cannot be changed. In designing such a system, a choice is made based on the redundancy required, the amount of data being stored, and the granularity of the data blocks. Based on these parameters, and balancing performance characteristics such as access time and recovery time, a configuration array (fixed group of physical disk drives) is selected. Once this drive grouping is established, only the designated erasure coding algorithm can be used to store data on those drives. Still further, writing data in a size smaller than the minimum specified by the selected erasure coding algorithm causes a performance hit (drop) because it requires a more time consuming read-modify-write, rather than simply a write.
Thus, there is a need for a more flexible system for allocating erasure coded data to disk storage. Increased flexibility would be desirable to enhance one or more of I/O performance, CPU utilization, storage capacity, fault tolerance, and/or recovery time.