Users are increasingly using virtualized storage systems that provide scalable local storage and/or storage “in the cloud” for client data (e.g., documents, application files, photos, mobile uploads, audio/video files, etc.). For example, a virtual drive or disk may be configured across multiple physical storage devices. This may enable a user to set-up a selectable amount of storage for a virtual drive, increase or decrease the storage over time, change configuration parameters for the virtual drive, and so forth.
One challenge associated with storage systems in general and with virtualized storage systems in particular is ensuring resiliency of stored data against hardware failures. A storage system is made resilient by storing multiple copies of the data and/or redundancy data (e.g., checksums, parity data, or other compressed forms of the data) that may be used to recover portions of the data located on devices that fail. Mirroring is a traditional approach in which a data is replicated completely on multiple devices that store copies of the same data. Recovery in a mirrored system is trivial, however, mirroring is relatively expensive and inefficient since enough storage space to accommodate multiple full copies of data is consumed.
Another traditional approach involves storing a determined amount of redundancy data that minimizes the amount of storage consumed while still enabling recovery of the data with a number of storage device failures at or below a specified tolerance. This approach maximizes storage efficiency by using a minimal amount of redundancy data, but may increase recovery time to unacceptable levels since using less redundancy data generally increases the time it takes to reconstruct data that is lost when a device fails. Further, different consumers may want to set up storage systems differently and may have different configuration constraints (e.g., budget, up-time goals, available physical space, etc.) that may be difficult to adhere to using either of the approaches enumerated. Accordingly, traditional data resiliency techniques provide limited, fixed options that may not satisfy the demands of some consumers for flexible and scalable virtualized storage that doesn't cost too much and/or is able to recover from failures reasonably fast.
In the event of a failure, unrecoverable data loss may still occur even for a resilient system if additional failures occur before the storage system is restored to a resilient state. Accordingly, the amount of time it takes to recover data and restore resiliency of a storage system is also a general consideration to account for in the design and configuration of storage systems.