Recent years, privacy information, such as purchase records and medical healthcare records, have been accumulated by providers of sale services and medical healthcare services (service providers).
The privacy information is composed of a plurality of attributes, including attributes that are referred to as quasi-identifiers. Quasi-identifiers are attributes, such as a year of birth and a gender, that characterize individuals and the combination of which has a possibility of identifying an individual.
Such privacy information has not been subjected to an active secondary use due to concern over an invasion of privacy. The secondary use means, for example, that a service provider who has generated and accumulated privacy information provides a third party with the privacy information, and the third party uses the privacy information in order to strengthen a service the third party itself provides. The secondary use also means that a service provider who has generated and accumulated privacy information outsources analysis or the like of the privacy information to a third party, or the like.
Making secondary use of privacy information without concern over an invasion of privacy makes it possible to promote research in use of privacy information and strengthen services that use a result of analysis and research thereof. Third parties other than a service provider who owns privacy information is also able to enjoy a high benefit the privacy information has.
What is considered to be a third party includes, for example, pharmaceutical companies. For a pharmaceutical company, it is difficult to obtain medical healthcare records. Obtaining medical healthcare records enables the pharmaceutical company to have knowledge of how drugs are used. Furthermore, the pharmaceutical company is also able to analyze co-occurrence relations and correlations of the drugs from the medical healthcare records.
Each record of a data set of privacy information includes, for example, a user identifier by which a service user (individual) is uniquely identified and one or more quasi-identifiers. A service provider accumulates such a record every time a service user enjoys the service.
Providing a third party with privacy information in which user identifiers have been included without any change enables the third party to identify service users by use of the user identifiers. That may cause a problem in terms of the invasion of privacy.
There is a case in which an individual can be identified from within a data set (for example, history information) that is composed of a plurality of records, on the basis of combinations of quasi-identifiers that are included in respective records. That is, there is a case in which, even from history information from which user identifiers are removed, an individual can be identified on the basis of combinations of quasi-identifiers therein, causing an invasion of privacy.
Anonymization has been known as a method to convert a data set of privacy information that has such characteristics to a form in which privacy is protected while intrinsic usefulness is maintained.
NPL1 proposes ‘k-anonymity’, which is one of the most popular anonymity indicators. A method to make a data set that is a subject of anonymization satisfy such k-anonymity is referred to as ‘k-anonymization’. In the k-anonymization, processing to convert quasi-identifiers is carried out in such a way that at least k or more records having the quasi-identifiers with the same values exist in the data set that is a subject of anonymization. For the conversion processing, methods, such as generalization and suppression, have been known. In the generalization, original specific information is converted to generalized information.
For example, PTL1 discloses a privacy information evaluation server. First, the privacy information evaluation server processes privacy information that is received from a user terminal. Second, the privacy information evaluation server decides whether or not the processed privacy information satisfies k-anonymity. Third, the privacy information evaluation server, on the basis of a result of the decision, outputs the processed privacy information from which identification information of users is removed.
Another related technology that uses such a k-anonymization technology is disclosed in NPL2. In NPL2, a method is proposed in which multi-dimensional data are k-anonymized by generating sets of records (hereinafter, referred to as clusters) that have similar attribute values successively and generating common attribute values in the records included in the clusters through generalization and suppression.
A k-anonymity decision unit in the privacy information evaluation server, on the basis of feedback from k-anonymity decision, generalizes the privacy information by a bottom-up process or a top-down process.
PTL2 discloses a privacy protection device for public information. First, the privacy protection device processes respective quasi-identifiers in input data to carry out generalization. Second, the privacy protection device decides whether a table that is composed of all the generalized quasi-identifiers satisfies a predetermined k-anonymity. Third, the privacy protection device, on the basis of a result of the decision, outputs an optimum data set.