In a data storage environment, more than one copy of the same data may be stored in different data repositories. Such redundancy is typically needed to support data mining and recovery (e.g., backup or archive services) or to promote system efficiency (e.g., the duplicate copies may be needed for the purpose of concurrent processing). Accordingly, the stability and the quality of service (QoS) in a system may be directly dependent on a structured data redundancy level.
On the other hand, maintaining the resources to support the storage and accessibility of redundant data is expensive. To control costs, optimization techniques such as deduplication are generally used to reduce the amount of redundant data. Duplicate data items may be reduced to single instances to save storage resources. Depending on QoS requirements and system implementation, however, several copies of the same data may need to be retained, despite of the desire to minimize duplication.
Thus, the challenge is to maintain the proper balance in storage optimization without affecting the QoS requirements. For example, where the QoS requirements for the same data across multiple computing platforms is not identical, the global application of a common deduplication technique will be counterproductive, and will adversely affect the QoS. As such, an optimization technique is needed that can support different data storage services across different platforms and help limit data storage costs, without compromising QoS for the provided services.