Entities often generate and use data that is important in some way to their operations. This data can include, for example, business data, financial data, and personnel data. If this data were lost or compromised, the entity may realize significant adverse financial and other consequences. Accordingly, many entities have chosen to back up some or all of their data so that in the event of a natural disaster, unauthorized access, or other events, the entity can recover any data that was compromised or lost, and then restore that data to one or more locations, machines, and/or environments.
While data backup is a valuable and important function, the ever increasing volume of data that is generated presents significant problems. In particular, many companies today find their backup and recovery process strained as data growth in enterprise IT environment continues to accelerate at exponential rates, while data-protection solutions have struggled to keep pace. Backup performance is crippled by the needs of up-to-date and online business applications.
In challenging environments such as these, attention has turned to deduplication solutions that can use storage space more efficiently by avoiding the storage of duplicate data. Many deduplication systems are global in scope, in that they identify and eliminate duplicate data across multiple users in a domain. That is, global deduplication systems are shared among multiple users, and are configured and operate to treat all data sets equally. In this way, an enterprise is able to take advantage of commonality among the various user data sets.
However, with the increasing popularity of Information Technology as a Service (ITaaS) models, enterprises are increasingly being asked to attribute the cost of IT services to end users, that is, consumers. To achieve this for global deduplication systems, IT services must offer show back capabilities that attribute consumption of deduplication services to the data of an individual user. The show back capabilities, in turn, enable charge back, that the ability to bill the user based on their indicated use of deduplication services. However, implementation of effective and accurate show back and charge back functions has not been achieved in typical deduplication systems. This is due at least in part to the fact that global deduplication systems and services are not designed to distinguish between different users in this way. Thus, while attempts have been made to adapt global deduplication to provide show back and charge back capabilities, such attempts have been largely unsuccessful.
Nonetheless, the need to be able to attribute deduplication costs to users, and the need to recover those costs, persist. Thus, many enterprises and service providers have resorted to show back models that rely on measuring the pre-compressed capacity consumed by the user objects. These metrics are recorded in the file system namespace which can be processed quickly and efficiently. While this approach is relatively easy to implement and administer, it is unable to quantify the effectiveness of the deduplication system relative to the objects of an individual user. Rather, all users are treated equally with respect to the effectiveness, or lack thereof, of the data reduction techniques. That is, the value provided by the deduplication system to any given user cannot be readily determined. Thus, while the value provided by the deduplication system can vary from one user to another user, all the users are treated as having received the same value. One result of this is that users who receive relatively less value effectively subsidize the users to whom relatively more value is provided.
As well, at least some deduplication systems are vulnerable to exploitation by users. For example, many users recognize that by pre-compressing data before sending the data to the deduplication system, they are able to reduce the amount of logical capacity consumed, which substantially reduces the users show back measure and, accordingly, the cost charged to the user. For the IT service provider, this is detrimental to the economics of the deduplication system and the show back cost model, which assumes an average level of deduplication for each of a plurality of users.
In light of problems and shortcomings such as those noted above, it would be useful to be able to measure, on a user basis, backend storage consumption for objects quickly and efficiently for a large number of users and for any low-end or high-end deduplication system. Correspondingly, it would be useful to be able to accurately attribute and bill, on a user basis, consumption of deduplication services. As well, it would be useful to be able to implement a deduplication system that is not easily exploited by users.