The processes of backing up data and moving data are complex and complicated for several reasons. In general, while some data storage and backup systems are efficient and effective in the technical sense, those systems can present unique challenges in terms of the ability to implement an associated economic or commercial model that fits well with the technology.
In one particular example, a data de-duplication storage system can enable a service provider to more efficiently use available storage space by reducing or eliminating the storage of duplicate data. This result can be achieved even where, for example, two instances of the same piece of data belong to different respective customers. Thus, in the data de-duplication storage system, there may be no distinction drawn between customers. Rather, the de-duplication functionality may transcend customer boundaries and take a holistic view of all the data stored, or to be stored.
While data de-duplication can provide various benefits in terms of data storage, it may present problems if a decision is made to move customer data to a different de-duplication domain. Because that customer data may include data that is common to one or more other customers, whose data is not being moved to a different de-duplication domain, it can be difficult for the data storage service provider to determine how much back end storage space will be required for the customer data that is to be moved. Without information as to how much back end storage space is needed, the service provider may encounter problems if an attempt is made to move the data to a different de-duplication domain that has inadequate storage space. Alternatively, the service provider may not be able to move the data at all.
A related problem concerns the ability of the service provider to bill the customer for data storage services. In general, a service provider may charge customers at a set cost per unit of stored data for a certain time period. To illustrate, a service provider could charge a customer $5/Gbyte/month. However, if the service provider is unable to ascertain how much data belongs to that customer, it will be difficult for the service provider to accurately bill the customer.
In view of problems such as these, there is a need to be able to determine the amount of data that belongs to a particular customer. One possible way to make this determination in a data de-duplication storage system, for example, might be to simply track all of the hashes associated with a particular user, without counting the same hash twice, and then adding up the sizes of all of the data pointed to by those hashes. However, such an approach would be inefficient, both in terms of the calculation algorithm and, correspondingly, in terms of the processing resources that would be required.
To illustrate, using the aforementioned $5/Gbyte/month example, the storage cost for each byte would be $0.000000005/byte. However, the smallest amount that a customer can be charged is 1 cent, or $0.01, and given the exceedingly small cost per byte, there is little practical reason to calculate the exact number of bytes associated with a particular customer. Correspondingly, there is no practical reason to calculate the customer cost to any fractional amount smaller than 1 cent, or $0.01, such as millionths of a cent in the example noted above.
Thus, while the example approach noted above may provide information that can be used to determine the amount of data associated with a customer, as well as the incremental cost to store that data, the calculation would be highly inefficient and require the use of an inordinate amount of processing resources that could be utilized for other tasks. Moreover, the cost to obtain the results provided by the calculation would likely outweigh the benefit of having those results.
In light of the foregoing, it would be useful to be able to calculate the amount of data associated with a customer, without performing more calculations, or using more resources, than necessary, while also producing results that have an acceptable level of accuracy.