With the popularity of cloud-computing services, it has become possible to collect and store a great amount of data at low cost. As a result, the utilization of the collected data has become possible, and an activity of acquiring new knowledge and an activity of providing new services have been conducted.
In particular, by utilizing personal data including personal information, an attempt to acquire knowledge relating to the future trends of persons, and an attempt to provide detailed services for individual persons have vigorously been made. Therefore, it can be said that the personal data is data which is a particular target of utilization.
On the other hand, when personal data is utilized, it is required that careful attention be paid to the handling of the personal data so that privacy may not be invaded. The invasion of privacy means, for example, that an individual is identified from data, and thereby private information, which the identified individual does not want others to know about, leaks.
Thus, when personal data is utilized, use is made of anonymization techniques which make it difficult to understand whose personal information is the personal data that is to be utilized, thereby to avoid an invasion of privacy.
Among the anonymization techniques, attention has been paid to a k-anonymity method which executes anonymization such that personal data of at least a k-number of persons become identical information.
However, according to the inventor's study, the above-described k-anonymity method has the following problem.
In general, in the k-anonymity method, anonymization is realized by repeating data conversions, such as deletion and generalization, on given data. Thus, by executing k-anonymity, an amount of information included in original data is lost.
When an excessive amount of information is lost, a problem arises in that it is possible that the information obtained by analyzing k-anonymized data does not correctly reflect the information obtained by analyzing the original data. Specifically, if the information loss amount increases, it is possible that at a time of data utilizing, erroneous determination would be caused by deriving erroneous information, which does not correctly reflect the information obtained from the original data, from the information obtained by analyzing the k-anonymized data.
According to the inventor's study, it is estimated that the above-described problem can be avoided if the information loss amount can be decreased while data is k-anonymized.
The task of the invention is to provide an anonymization apparatus and a program, which can decrease an information loss amount, while data is k-anonymized.