In recent years, various personal information has been accumulated and services based on such accumulated data have become more widespread. Various personal information being collected includes, for example, purchased products managed by integrated circuit (IC) tags, positional information about mobile terminals based on global positioning system (GPS) signals, postings to social media, history of Web site searches, and history of product purchases at online stores. The collected personal information is utilized for such services as offering consumers commodities in which they may be interested. In the following descriptions, a data generating apparatus such as a smartphone is denoted as a source.
For providing services based on the data collected from sources, it is necessary to develop an application for implementing such services. Some applications may be developed by telecommunications carriers permitting mobile terminals to use communication networks or online shopping operators. Alternatively other applications may be developed by external contractors. Or, some business operators buy personal information from data holders such as telecommunications carriers and provide their own services using such personal information. In other words, the entity collecting the personal information is not always the entity utilizing the personal information. This causes transfer of the personal information between the data holder and a data user. In such cases, it is necessary to prevent the individuals (source holders) who have provided information to the data holder from being identified based on the data provided to the data user and to prevent personal privacy information from being divulged.
NPL 1 discloses an anonymization technique, what is called k-anonymization. If the number of sensitive information pieces having the same quasi-identifier is less than k (where k is an integer equal to or greater than 2, which applies throughout the following), anonymizing the quasi-identifier with the technique called k-anonymization guarantees that the number of sensitive information pieces having the same quasi-identifier is at least k. A quasi-identifier refers to an attribute that allows for inference of a secret attribute when combined with another value. In other words, a quasi-identifier is different from an identifier that uniquely identifies a user, but represents information that provides possibility to distinguish the user (identify the user) when background information or the like is taken into consideration. Examples of a quasi-identifier may include gender, age, and occupation. Sensitive information means personal information that an individual wants to be undisclosed to others. Examples of sensitive information may include an individual's hobby and disease.
The following describes an example of k-anonymization by referring to FIG. 22. In FIG. 22, Age and Occupation are quasi-identifiers while Disease represents sensitive information, with the assumption that k is equal to 3. In (a) of FIG. 22, information prior to anonymization is shown concerning age, occupation, and disease of patients. In (b) of FIG. 22, information produced through anonymization is shown concerning age, occupation, and disease of the patients. Each of the three data pieces in (a) of FIG. 22 has its own age and occupation, and thus the fact that a 27-year-old programmer has cancer is known. Accordingly, a person who knows the age (27) and occupation (programmer) may possibly identify the patient.
In such cases, the operation called k-anonymization is used to generalize the quasi-identifiers, namely age and occupation.
Specifically, as illustrated in (b) of FIG. 22, a common age and occupation are shared by the patients to represent that there are k (specifically 3 here) patients having the same age and occupation, and thus a person who knows the age and occupation of a patient cannot identify the exact disease of the patient. In this way, guaranteeing k-anonymity makes the probability of identifying an individual 1/k or less.