1. Technical field
This invention relates generally to data anonymization systems and in particular, but not exclusively, to systems for managing anonymized subscriber data.
2. Description of related art
This section introduces aspects that may be helpful to facilitating a better understanding of the invention. Accordingly, the statements of this section are to be read in this light and are not to be understood as admissions about what is in the prior art or what is not in the prior art.
Service providers are increasingly concerned about complying with the legal requirements for ensuring the privacy of their subscriber's data. One of the key legal requirements is to anonymize the subscriber data before using it for a purpose (e.g., marketing or analytics) other than providing the service requested by the subscriber.
There are several research prototypes for data anonymization, but very few commercial systems, as data anonymization is still a very active research area. Commercial systems primarily use data masking (e.g., replacing whole or parts of a sensitive value with a random string), while the research prototypes typically use either some variant of k-anonymity (e.g., hiding the actual secret in a set of k possible secrets) or differential privacy (e.g., ensuring that the output of a function does not change significantly due to the presence or absence of an individual's data).
There exists a fundamental tradeoff between privacy and the utility of the subscriber data resulting from the reduction in the quality of the data due to anonymization. Therefore, service providers may need to take into account the data quality requirements when anonymizing the data. This becomes even more challenging as different applications/usages of data have different data quality requirements. Thus, service providers have a need for an anonymizing framework that provides multiple options to meet the privacy requirements as well as data utility requirements of different usages.
However, neither commercial nor research prototype data anonymization systems provide more than one way of anonymizing data. These systems are designed and implemented primarily to address one specific datum in the spectrum of privacy utility tradeoff. In addition, these systems do not take into account the identity or role of the accessing party to adjust the degree of anonymity and/or utility of the data. In addition, some systems use irreversible one-way data transformation functions, such as generalization (e.g., replacing a precise value with a semantically consistent but less precise value). Therefore, it is not possible to obtain the original data or improve the quality of the data, if required by an application (e.g., for usage by a government authority or for system diagnosis).