In a deduplication storage system, presented data is data that is presented to the storage system, before deduplication. Raw data is the actual data stored after deduplication. A deduplication storage service vendor adds value to a storage system by the deduplication ratio that the deduplication storage provides. For data with a high deduplication ratio, such as 20:1, a customer can buy a storage system that stores, e.g., 1 TB of raw data, but 20 TB of presented data. Deduplication ratio varies with data types and data sets. For example, backing up an email system with some added messages since the last backup will have a high deduplication ratio because only the new messages need to be deduplicated whereas older messages can be referenced by pointers to the data. On the other hand, encrypted data—even if it is the same data as previously backed up—will have a low deduplication ratio.
Another factor that affects deduplication ratio is a change in configuration of how one or more storage devices are used by a customer or client device. For example, migration of a data set, such as a plurality of virtual desktops or a database, from a first storage device to a second storage device, can change the deduplication ratio of both the first and second storage devices. The first storage system may be heavily used while the second storage system is lightly used. To balance network traffic to the storage system and reduce storage latency, an administrator may wish to move some data sets to the second storage system. But, doing so may reduce the deduplication ratio of both the first and second storage systems and use more raw storage. This is especially true with, e.g. virtual desktops, because they are numerous and have a high deduplication ratio due to their similarity to one another.
A customer only wants to buy as much hardware (raw storage) as is needed, in view of the deduplication ratio of the customer's data. Deduplication adds value for a customer by allowing the customer to buy less storage capacity than the data presented for storage. Deduplication storage vendors could then lose revenue based on customers needing less raw storage. Deduplication storage vendors could charge for the amount of deduplication value added by the deduplication logic in the storage device. But, different customers have different data with different deduplication rates. Currently, there is not a system that determines the value of deduplication service to a customer that is based upon the actual deduplication value received by the customer, based upon customer-specific deduplication information. There also is not a system that determines the projected effect upon deduplication efficiency and storage usage in view of configuration changes to usage of one or more storage devices.