1. Technical Field
The present invention relates generally to storage of data and, more particularly, to storage of data in a secondary storage system.
2. Description of the Related Art
The development of secondary storage technology, an important aspect of the enterprise environment, has had to keep pace with increasingly strenuous demands imposed by enterprises. For example, such demands include the simultaneous provision of varying degrees of reliability, availability and retention periods for data with different levels of importance. Further, to meet regulatory requirements, such as the Sarbanes-Oxley ACT (SOX), the Health Insurance Portability and Accountability Act (HIPPA), the Patriot Act, and SEC rule 17a-4(t), enterprise environments have demanded improved security, traceability and data audit from secondary storage systems. As a result, desirable secondary storage architectures define and institute strict data retention and deletion procedures rigorously. Furthermore, they should retain and recover data and present data on demand, as failing to do so may result not only in a serious loss to business efficiency, but also in fines and even criminal prosecution. Moreover, because business enterprises oftentimes employ relatively limited information technology (IT) budgets, efficiency is also of primary importance, both in terms of improving storage utilization and in terms of reducing mounting data management costs. In addition, with ever increasing amounts of data produced and fixed backup windows associated therewith, there is a clear need for scaling performance and backup capacity appropriately.
Substantial progress has been made to address these enterprise needs, as demonstrated by advancements in disk-targeted de-duplicating virtual tape libraries (VTLs), disk-based backend servers and content-addressable archiving solutions. However, existing solutions do not adequately address the problems associated with the exponential increase in the amount of data stored in secondary storage.
For example, unlike primary storage, such as a storage area network (SAN), which is usually networked and under common management, secondary storage comprises a large number of highly-specialized dedicated components, each of them being a storage island entailing the use of customized, elaborate, and often manual, administration and management. Thus, a large fraction of the total cost of ownership (TCO) can be attributed to management of a greater extent of secondary storage components.
Moreover, existing systems assign a fixed capacity to each storage device and limit duplicate elimination to only one device, which results in poor capacity utilization and leads to wasted space caused by duplicates stored on multiple components. For example, known systems include large Redundant Array of Inexpensive Disks (RAID) systems, which provide a single control box containing potentially multiple, but limited number of controllers. The data organization of these systems is based on a fixed-size block interface. Furthermore, the systems are limited in that they employ a fixed data redundancy scheme, utilize a fixed maximal capacity, and apply reconstruction schemes that rebuild entire partitions even if they are empty. Moreover, they fail to include a means for providing duplicate elimination, as duplicate elimination with such systems must be implemented in higher layers.
Other known systems deliver advanced storage in a single box, such as DataDomain, or clustered storage, such as EMC Centera. The disadvantages in these types of systems are that they provide limited capacity and performance, employ per-box duplicate elimination as opposed to a global one (DataDomain) or are based on entire files (EMC Centera). Although these systems deliver some of the advanced services such as deduplication, they are often centralized and metadata/data stored by these systems do not have redundancy beyond standard RAID schemes.
Finally, because each of these known secondary storage devices offers fixed, limited performance, reliability and availability, the high overall demands of enterprise secondary storage in these dimensions are very difficult to meet.