The present invention relates to an anonymization device.
With the growing concern over the protection of personal information recently, various privacy preserving technologies have been studied.
Non-Patent Document 1, for example, discloses an anonymization method for satisfying k-anonymity. K-anonymity means a state in which tuples having the same data value information (a combination of attribute values) as other tuples exist in the total number of k or more in a data table.
Non-Patent Document 2 discloses a method of anonymization using local recording. Local recording is about, while displaying, for example, a category of ages in increments of five years, making a part of the category sparse; i.e., displaying the ages corresponding only to specific data items in increments of ten years.
Non-Patent Document 3 also discloses an anonymization method using local recording. In the method disclosed in Non-Patent Document 3, a group G not satisfying the k of k-anonymity searches another group G′ satisfying that G∪G′ satisfies the k and has the lowest number of data items. The set G and the set G′ are then merged. If the number of data items in a set obtained as a result of merging G and G′ is 2 k or more, the merged set is divided into two.
Patent Document 1 discloses a method for identifying, in cases where changes in personal information occur, the safety of personal information pieces with respect to a set that includes a predetermined number of more pieces of personal information.
Non-Patent Document 1: L. Sweeney, Achieving K-Anonymity Privacy Protection Using Generalization and Suppression, International Journal on Uncertainty, Fuzziness and Knowledge based Systems, 2002, P. 571-588
Non-Patent Document 2: K. LeFevre, et al., Mondrian Multidimensional K-Anonymity, Proceedings of the 22nd International Conference on Data Engineering, 2006, P. 25
Non-Patent Document 3: Jian Xu, et al., Utility-Based Anonymization Using Local Reading, Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2006, P. 785-790
Patent Document 1: Patent Publication JP-A-2009-181207
Incidentally, according to the method disclosed in Non-Patent Document 1, a singularity set is suppressed in order to keep the k-anonymity. Note that the singularity set means a set that has a small number of data items and therefore cannot satisfy the k-anonymity even when the data items are anonymized. It is, of course, possible to keep the k-anonymity by suppressing the singularity set, but the suppressed set is not reflected in statistical information; thus, the statistical information cannot be accurate. Moreover, suppressing the singularity set disables the distribution of information such as advertisements to users belonging to the singularity set.
In the method disclosed in Non-Patent Document 2, the abstraction level of a group to which a singularity set belongs is high. However, when the k-anonymity is not satisfied in spite of the high abstraction level, the entire dataset becomes “unknown,” resulting in an increase in the level of data distortion.
Furthermore, according to the method disclosed in Non-Patent Document 3, one group to be merged with a singularity set is selected such that the numbers of data items of the respective merged groups satisfy the k-anonymity and become the lowest. Consequently, the level of data distortion can be minimized, but the difference in the number of data items between the groups in the dataset or the ratios of these data items cannot be understood. Also, it is difficult to follow temporal changes in the number of data items in each group.
For instance, of all subscribers of a certain service, suppose that 100 subscribers live in Tokyo and five subscribers abroad at time t0,and that 200 subscribers live in Tokyo and eight abroad at time t1. In this case, the anonymization method according to Non-Patent Document 3 is described using the value of k of k-anonymity as 10.
First, at the time t0, the group of subscribers living abroad does not satisfy the k-anonymity. Therefore, data items corresponding to five subscribers of the group living in Tokyo are merged into the group of subscribers living abroad and then generalized. At the time t1, on the other hand, data items corresponding to two subscribers of the group living in Tokyo are merged into the group of subscribers living abroad and then generalized. This consequently can minimize the number of data items corresponding to the subscribers living in Tokyo who are generalized together with the subscribers living abroad, and minimize the level of data distortion.
Nonetheless, the increase in the number of subscribers living in Tokyo or abroad at the times t0 and t1 cannot be understood. For the purpose of generalizing the group of subscribers living abroad, the data items corresponding to five subscribers of the group living in Tokyo are used at the time t0 and the data items corresponding to two subscribers are used at the time t1. In actuality, in spite of the fact that the number of subscribers living Tokyo has increased by 100 and the number of subscribers living abroad by three, this method shows that while the number of subscribers in Tokyo has increased to 103, the number of subscribers living abroad did not increase at all.
Moreover, the method disclosed in Patent Document 1 determines the personal information is “safe” when there exist a certain number or more of data items having the same record value, but does not take into consideration how to respond to a situation where the number of the data items is equal to or lower than the certain number.