In recent years, cloud computing has been increasingly used. In the cloud computing, when information in the company is released to the outside of the company, processing of anonymizing information specifying an individual and making it difficult to specify an individual is performed. As a technique of anonymizing and making it difficult to specify an individual, there is a technique of making information specifying an individual vague or removing information specifying an individual (for example, see Japanese Laid-open Patent Publication No. 2007-287102, Japanese Laid-open Patent Publication No. 2007-219636, and Japanese Laid-open Patent Publication No. 2007-141192). As another technique of anonymizing and making it difficult to specify an individual, there is a technique of converting a numerical value specifying an individual into a kana character.
FIG. 51 is a diagram for describing an example of an anonymizing technique. Individual information to be anonymized is illustrated on the left of the example of FIG. 51. For example, individual information to be anonymized illustrated in the example of FIG. 51 is information such as a medical checkup. The individual information to be anonymized illustrated in the example of FIG. 51 includes various items such as “name,” “height,” “weight,” and “age.” A person's name is registered to the item of “name.” The height of the person whose name is registered to the item of “name” is registered to the item of “height.” The weight of the person whose name is registered to the item of “name” is registered to the item of “weight.” An age of the person whose name is registered to the item of “name” is registered to the item of “age.”
When the individual information illustrated on the left of the example of FIG. 51 is anonymized, anonymous data illustrated in the right of the example of FIG. 51 is obtained. The anonymous data illustrated in the example of FIG. 51 represents an example in which information registered to the item of “name” of the individual information is discarded, and information registered to the items of “height,” “weight,” and “age” becomes vague.
Further, even when the individual information is processed into anonymous data, it may be possible to specify an individual through collation with other information (hereinafter, referred to as “collation easiness”). For example, in the anonymous data illustrated in the example of FIG. 51, even when information in which a name is “A” and the height of a person is “175 cm” is collated with the anonymous data, since there are two records of the anonymous data representing that the height is “175 cm,” it is difficult to specify a corresponding record. Similarly, even when information in which a name is “B” and the height of a person is “173 cm” is collated with the anonymous data, since there are two records of the anonymous data representing that the height is “173 cm,” it is difficult to specify a corresponding record. However, when information in which a name is “C” and the height of a person is “182 cm” is collated with the anonymous data, since there is only one record of the anonymous data representing that the height is “182 cm,” it is possible to specify a corresponding record. Further, when information in which a name is “D” and the height of a person is “169 cm” is collated with the anonymous data, since there is only one record of the anonymous data representing that the height is “169 cm,” it is possible to specify a corresponding record.
Here, there is no objective criterion on whether or not there is “collation easiness,” and it is difficult to determine whether or not anonymous data can be safely used. The “collation easiness” has the following points of view:
(1) whether or not it is an environment in which it is possible to easily collate with other information; and
(2) whether or not it is possible to identify an individual as a result of collating with other information.
In the point of view of (1), since the collation easiness is rejected by performing a countermeasure including data manage (a collation right, a collation range, and an information leakage countermeasure), it is difficult to perform determination based on only a specification of software generating anonymous data. The point of view of (2) is referred to as an “individual identifiability.” When anonymous data is generated, safe anonymous data can be generated by performing processing of discarding the record in which an individual is likely to be identified. Thus, even when it is possible to easily collate with other information or even when information identifying an individual leaks out, since it is difficult to specify an individual, anonymous data can be safely used.
As a technique of processing anonymous data, for example, there is a technique of processing anonymous data by determining and removing information in which an individual is likely to be specified when the information is collated with individual information.
Further, known is a technique of verifying an individual identifiability based on duplication of records in anonymous data and then processing data is also known (for example, see Japanese Laid-open Patent Publication No. 2009-181207). This technique uses the principle in which a duplication number that records are duplicated in anonymous data is N or more, since N or more results are obtained as a result of collating with individual information, it is difficult to identify an individual from anonymous data.
Specifically, processing illustrated in FIG. 52 is performed. Anonymous data illustrated on the left of FIG. 52 includes 3 records, and two upper rows are the same. Since it is determined that there is no individual identifiability when the two or more records are the same, the records are added to verified anonymous data as “OK.” However, since a record of ABCD is present in only one row, there is an individual identifiability, and thus the record is determined as “NG.” In this case, for example, attribute values B and C of some of ABCD are converted into X, and a record of AXXD is added to verified anonymous data. Meanwhile, the record of ABCD is discarded. This processing method is effective when records previously accumulated in a single database are processed.
However, there is a problem when data appropriately collected from various business systems is anonymized and then output to another system that uses anonymized data. For example, when the three records illustrated on the left of FIG. 52 are first collected and then subjected to the above-described processing, data illustrated on the right of FIG. 52 is output to another system. Thereafter, when three records illustrated on the left of FIG. 53 are newly collected and then subjected to the above-described processing, since two upper rows are the same, it is determined that there is no individual identifiability, and thus the records are added to verified anonymous data as “OK.” However, since a record of ABCD is present in only one row, there is an individual identifiability, and the record is determined as “NG.” In this case, some attribute values B and C are converted into X, and a record of AXXD is added to verified anonymous data. Then, the record of ABCD is discarded. As described above, the record of ABCD appears twice, but since the records differ in collection timing, the record of “AXXD” is registered to verified anonymous data twice. In this case, information such as ABCD is lost, and this causes a problem in statistical processing or the like in another system. For example, a problem occurs in statistical processing or the like when a large amount of data is determined not to satisfy a predetermined inter-data condition such as a “match of data” among pieces of data included in a collected data group.