Businesses worldwide recognize the commercial value of their data and seek reliable, cost-effective ways to protect the information stored on their computer networks while minimizing impact on productivity. Protecting information is often part of a routine process that is performed within an organization.
A company might back up critical computing systems such as databases, file servers, web servers, and so on as part of a daily, weekly, or monthly maintenance schedule. The company may similarly protect computing systems used by each of its employees, such as those used by an accounting department, marketing department, engineering department, and so forth.
Given the rapidly expanding volume of data under management, companies also continue to seek innovative techniques for managing data growth, in addition to protecting data. For instance, companies often implement migration techniques for moving data to lower cost storage over time and data reduction techniques for reducing redundant data, pruning lower priority data, etc.
Enterprises also increasingly view their stored data as a valuable asset. Along these lines, customers are looking for solutions that not only protect and manage, but also leverage their data. For instance, solutions providing data analysis capabilities, improved data presentation and access features, and the like, are in increasing demand.
In response to these challenges, one technique developed by storage system providers is data deduplication. Deduplication typically involves eliminating or reducing the amount of redundant data stored and communicated within a storage system, improving storage utilization. For example, data can be divided into units of a chosen granularity (e.g., files or sub-file data blocks). The sizes of the data blocks can be of fixed or variable length. As new data enters the system, the data units can be checked to see if they already exist in the storage system. If the data unit already exists, instead of storing and/or communicating a duplicate copy, the storage system stores and/or communicates a reference to the existing data unit. Thus, deduplication can improve storage utilization, system traffic (e.g., over a networked storage system), or both.
Even in those systems employing deduplication, data management operations, including backup and restore operations, can place heavy demands on available network bandwidth and available system resources. Such operations can also introduce significant delay, e.g., due to communication latency between secondary storage (e.g., non-production, backup storage) and primary storage (e.g., production storage). In addition, if a device or script involved in the deduplication process fails or becomes unavailable, recovering from such failures to restore the deduplication to the pre-failure state can be quite time consuming.