Commercial enterprises (e.g., companies) and others gather, store, and analyze an increasing amount of data. The trend now is to store and archive almost all data before making a decision on whether or not to analyze the stored data. Although the per unit cost associated with storing data has declined over time, the total costs for storage has increased for many companies because of the volumes of stored data. Hence, it is important for companies to find cost-effective ways to manage their data storage environments for storing and managing large quantities of data. There are several problems with traditional approaches to capacity storage. Most traditional storage systems have difficulty scaling to support billions of values, which is far small than the trillions of objects that customers are storing today.
Traditional data protection mechanisms, e.g., RAID, are increasingly ineffective in petabyte-scale systems as a result of: larger drive capacities (without commensurate increases in throughput), larger deployment sizes (mean time between faults is reduced) and lower quality drives. The trends from the hard drive vendors are making traditional RAID increasingly difficult to implement, and are requiring complex techniques, e.g., triple parity, declustering. Some of the storage device trends that push away from traditional data protection mechanisms include: increasing drive sizes, lower I/O limits on drives, varying latency (which can slow I/O), varying capacity (within a given model/drive line, which can increase inefficiency of traditional RAID, lower drive reliability (increased failure rates, and more intense workload-triggered failures). Thus, the traditional data protection mechanisms are ill-suited for the emerging capacity storage market needs.
Further, the current data storage systems have complex data protection mechanisms, which typically involve performing a significant amount of I/O on the storage devices in order to provide a specified storage resiliency. This intensive I/O for protection purposes together with the I/O performed for providing data access to the customers wears the storage device much faster and therefore, decreases the lifespan of the device rapidly. To maintain the same storage resiliency, the storage devices may have to be replaced with new ones regularly, which can drive up the storage costs.