There are technologies that process collected personal information into anonymized information in order to inhibit an individual from being identified. Even when personal information is generally processed into anonymized information, the anonymized information is classified as personal information if an individual can be identified by a comparison of that anonymized information with other information (referred to as “easy comparability”). Unfortunately, there is no objective criterion for judging whether there is “easy comparability”, and it is difficult, for this reason, to judge whether anonymized information can be safely used. This “easy comparability” involves the following viewpoints. (1) Is anonymized information placed in circumstances where the anonymized information can be easily compared to other information? (2) As a result of a comparison of the anonymized information with other information, can an individual be identified?
With regard to viewpoint (1), measures that include data management (reference authority, a reference range, and measures against information leakages) are taken, and, as a result, easy comparability may be inhibited. A judgment is therefore impossible only by software. With regard to viewpoint (2), which is also referred to as personal identifiability, information is processed in such a manner that records by which an individual may be identified are removed, so that safer anonymized information may be generated. In this way, an individual is not identified even if anonymized information may be easily compared to other information, and even if information by which an individual is identified leaks elsewhere. Anonymized information may therefore be used safely.
The technologies that process personal information into anonymized information include, for example, a technology that compares information to personal information, and, as a result, judges the information as being information that may lead to identification of an individual and removes the information, thereby turning the information into anonymized information.
There is also a technology that verifies personal identifiability using duplication of records in anonymized information, so that data processing is performed. This technology uses a rule indicating that if the number of duplications of records in anonymized information is N or more (where N is an integer that is set to be greater than or equal to 2), since the number of results obtained by comparing the anonymized information to personal information is N or more, and an individual is unable to be identified by the anonymized information.
Specifically, processing as illustrated in FIG. 1 is performed. The anonymized information illustrated on the left of FIG. 1 includes three records. Records in two rows from the top are identical, and it has been verified that there is no identifiability when the number of identical records is two or more. Therefore, these records are judged to be [OK] and are added to verified, anonymized information. However, a record consisting of ABCD is placed in only one row, and therefore there is personal identifiability. Accordingly, this record is judged to be [Failed]. Then, for example, attribute values B and C, which are part of the record consisting of ABCD, are each converted into X, and a record consisting of AXXD is added to the verified, anonymized information. The record consisting of ABCD itself is discarded. The above processing method is effective for processing records that have already been accumulated in one database.
Unfortunately, a problem arises where data appropriately collected from various business systems is anonymized, and the anonymized data is output to other systems that utilize that anonymized data. Specifically, when firstly the three records as illustrated on the left of FIG. 1 are collected, then processing of the three records is performed as described above, and data illustrated on the right of FIG. 1 is output to other systems. Thereafter, when three new records as illustrated on the left of FIG. 2 are collected, and processing of the three new records is performed as described above, it is verified that two rows from the top of the three new records are identical and there is no personal identifiability. These records are therefore judged to be [OK] and are added to verified, anonymized information. However, a record consisting of ABCD is placed in only one row, and therefore there is personal identifiability. Accordingly, this record is judged to be [Failed]. Then, attribute values B and C, which are part of the record consisting of ABCD, are each converted into X, and a record consisting of AXXD is added to the verified, anonymized information. The record consisting of ABCD itself is discarded. In this manner, although the record consisting of ABCD appears twice, the collection timings are different, and therefore the record “AXXD” is twice registered in the verified, anonymized information. As a result of this, however, information of ABCD is lost, which leads to an obstacle to statistical processing or the like in other systems.