Solid-state devices or drives (SSDs) are data storage devices that use solid-state memory to store data in a persistent manner that emulates a hard disk drive. As the cost of solid-state components has dropped, SSDs have become more and more popular, and are replacing rotational hard disk drives in many computing environments and systems.
However, SSDs possess some disadvantages, such as a limited number of write cycles. In particular, memory cells within an SSD wear out after some number of write operations, which may range from one thousand to hundreds of thousands. In a server farm or other computing environment in which storage devices are constantly being written to, an SSD's memory cells may encounter this number of writes in as little as a few days, or as much as a few years.
An SSD's erase block identifies the minimum amount of storage space that can be erased at once on the device, and may be as large as multiple megabytes. Even if only a small percentage of the data encompassed by a particular erase block is changed during a given write operation, all cells in that erase block are erased and therefore become one operation closer to wearing out. Regular updates to stored data may occur frequently in some computing environments and applications (e.g., caching, data reduction, online databases, electronic mail queues).
To rewrite a set of data stored on an SSD (e.g., when the data is to be updated), the data must be read and modified, and the storage location (i.e., the data's erase block(s)) must be erased to prepare for the rewrite. Because an entire erase block must be cleared and rewritten, regardless of how little data is being updated, random writes can be relatively slow on SSDs. In fact, some SSDs perform worse than rotational hard disk drives when it comes to random writes.
Random writes may be so slow on some SSDs that even if a relatively small portion of all input/output operations on the device are write operations, the device may yield poorer performance than a rotational disk drive. Although SSDs may provide excellent performance for random read operations, organizations contemplating adopting solid-state devices must consider very carefully the nature of their computing environments (e.g., the types of input/output operations that are most prevalent).
The cost of SSDs would naturally lead one to want to use it as efficiently as possible but, unfortunately, some storage system architectures and schemes that operate well with rotational hard disk drives are inefficient when implemented with SSDs. For example, many RAID (Redundant Array of Independent Disks) systems use mirroring, wherein data stored on one device is replicated on a mirror of that device. This can provide efficient and inexpensive redundancy when implemented with hard disk drives, although the usable storage capacity of the mirror set is only one-half of the total disk capacity. However, when implemented with SSDs, using only one-half of the storage capacity of expensive solid-state devices may be very inefficient from a cost perspective.
Another problem with mirror sets is that when one of the devices fails, replacement of that device will slow the input/output system because of the need to image the replacement device from a functioning member of the mirror set. And, of course, when a mirror set contains only two devices, if both of them fail, then the mirror set fails and all data stored in the mirror set is lost.
Some RAID architectures stripe data across multiple disks or other storage devices, instead of or in addition to mirroring. In some of these architectures, failure of one device may cause the loss not only of data stored on that device, but of data stored on another device as well. In particular, because the properties of the RAID scheme (e.g., stripe size, number of devices, device capacity, erase block) are independent of the properties of the application or applications that store data on the RAID, data in different stripes may be inter-dependent even though they are stored on different devices.
For example, an application may store sets of data that are larger than one stripe in size. Each set of data would thus comprise more than one stripe, on more than one device. If, within each set of data, the application stores index information, metadata or other special information for accessing, locating or otherwise managing the contents of the set of data, and if the device on which that information is stored fails, the corresponding data in the other stripes (on other devices) may become inaccessible.