There is a case where records that were collected from plural information providers and include numeric attribute values are desired to be disclosed or provided for others while concealing an identifier of the information provider (hereinafter, which is abbreviated by an ID, simply) of each record. At this time, there is a case where others can presume the information provider for a record that has a characteristic numeric attribute value, even when the record is disclosed or provided after the ID is deleted.
For example, a case is considered where a collector of individual's position data provides the position data in a form that the information provider is unknown, for an analyzer. Here, the collector may be a service provider who provides the service regarding the position data, and the analyzer may be a cloud service provider, a data secondary user (e.g. apopulation density investigation company or the like) and the like.
Here, assume that the position data collected by the collector is as depicted in FIG. 1. In an example of FIG. 1, each record includes a line number, an ID, an X coordinate (latitude) and a Y coordinate (longitude). Here, each record represents the position data of any one person among three persons A, B and C, and the total number of records is 7. In other words, plural records for the same ID may appear. In addition, the ID may be a user ID of the individual or an ID of a measurement equipment. Moreover, the ID may be an ID of a department to which the user belongs.
When data as depicted in FIG. 1 is plotted on a map, the map as depicted in FIG. 2 is obtained, for example. When the analyzer can obtain data as depicted in FIGS. 1 and 2, the analyzer can put the data to use the analysis. For example, it can be understood that persons gather in the vicinity of houses A and B.
However, for example, a situation is considered where a contract that the data is not provided for others as long as the anonymization is not performed is agreed between the collector and the information provider. The information provider may desire the anonymization because information regarding where the information provider was at a specific timing is desired not to be known to a person other than the collector or because there are some other reasons.
On the other hand, the analyzer may not use information of the information provider such as the ID. This is because the analysis such as the population density investigation can be performed even if the provider of the position data is not specifically identified.
In such a case, it is sufficient that the collector anonymizes the data in FIG. 1 to make it difficult to presume the information provider.
As a simple anonymization method by the collector, there is a method to delete the ID. Even if the analyzer watches data whose ID is deleted from FIG. 1, the analyzer cannot identify the informationprovider for each record, when the data is analyzed as it is. However, there is a problem that there is a record whose information provider can be presumed from the position data.
When data whose ID is deleted from FIG. 1 is plotted on the map as depicted in FIG. 2, it can be understood that the position data (X, Y)=(6, 2) in the first record is within the house A. In other words, even for the analyzer who can watch only data whose ID is deleted, it is possible to presume that the information provider of the first record is “A”, and it cannot be said that the anonymization is sufficient. Similarly, the anonymization for records other than the seventh record is not sufficient.
As a conventional art, there is a method for grasping, as groups, plural numeric ranges that do not overlap each other and are predetermined and converting the records within each group into their statistical value.
In this conventional art, an area is meshed based on the latitude and longitude, and the statistical value for records within each mesh element is calculated, and then disclosed or provided.
As the statistical value, the number of records for each mesh element is used, for example, “3 records within a mesh element M1”. Or, the ID may be deleted for each record, and the position of the record may be converted to a central point of the mesh element.
For example, a case is considered where respective records in FIG. 1 are grouped by mesh elements whose length of one side is “5” and are converted. In such a case, for example, (X, Y)=([5, 10), [0, 5)) corresponds to one mesh element, i.e. a group. When this mesh element is temporarily named as M10, only the first record in FIG. 1 is classified into M10. Therefore, a matter “there is one record in the mesh element M10” is disclosed, or the first record is converted to (X, Y)=(7.5, 2.5) (i.e. the central point of M10) and disclosed.
In this conventional art, when the mesh size is sufficiently large, no problem arises for the anonymization. However, there is a problem that the anonymization is threatened when the mesh size becomes small. For example, if the mesh element M10 is included in a site of the house A (e.g. a case where the site of the house A is represented by (X, Y)=([2, 10], [0, 6])), it is possible to presume that the information provider of the record, which is classified to the mesh element M10, is “A”. When the mesh size becomes smaller, the possibility becomes high that the mesh element is included in an area in which only a specific ID is sure to be exist.
On the other hand, when the mesh size becomes larger, the degree of generalization of the positions becomes greater. Therefore, there is a problem that the accuracy of the analysis by the analyzer is badly influenced. For example, there is a case where mesh elements whose length of one side is about 1 km are used in a statistical investigation. However, generally, it is impossible to present an analysis result regarding areas that are smaller than a 1 km square as long as only the result of the anonymization is used.
Thus, in order to guarantee the anonymity, this conventional art has to enlarge the mesh size, and there is a problem that the accuracy of the analysis is badly influenced.
Moreover, as another conventional art to generate groups, there is a technique for adjusting positions of ranges so that k (k is a preset value) or more records are included in a range whose size is less than a preset value d and ranges do not overlap each other, and for grouping based on those ranges.
This conventional art supposes records that have different ID from each other as target data, and in such a case, appropriate anonymity is guaranteed. However, there is a problem that it is impossible to guarantee the sufficient anonymity with respect to data in which plural records having the same ID exist as illustrated in FIG. 1.
For example, a case is considered where part of this conventional art is applied to group respective records in FIG. 1 so that a rectangle whose length of one side is less than “5” (i.e. d=(5, 5)) corresponds to a group and 3 or more (i.e. k=3) records are included in each group. In such a case, for example, two groups are obtained, one is a rectangle R43: (X, Y)=([2, 6], [2, 4]) including records {1, 2, 3} and the other is a rectangle R49: (X, Y)=([2, 6], [8, 10]) including records {4, 5, 6}. However, similarly to the aforementioned example, there is a possibility that the rectangle is included in an area in which only a specific ID is sure to exist. For example, when the rectangle R43 is included in the site of the house A, it is possible to presume that the information provider of the records {1, 2, 3}, which are classified to the rectangle R43, is A.
Typically, a method that can also handle records in which the same ID exists as illustrated in FIG. 1 is better, because an application range is broader. For example, especially in case where the information provider is an organization, there is a case where data of plural measurement equipments includes the same ID (i.e. organization ID). Moreover, by allowing the existence of plural records in which the same ID exists, it becomes possible to analyze a lot of records once, and the enhancement of the analysis accuracy is expected. However, this conventional art can guarantee the anonymity only for special target data, and there is a problem that there is few application scenes.
Furthermore, as another conventional art for grouping, there is a technique for making the number of kinds of secret attribute values equal to or more than 1 within each group (i.e. a technique for satisfying l-diversity). This conventional art has a problem that it is difficult to make the size of the group less than a predetermined range. When the size of the group cannot be made to be less than the predetermined range, there is a problem that the accuracy of the analysis is badly influenced.    Non-Patent Document 1: O. Abul, F. Bonchi, and M. Nanni. Never Walk Alone: Uncertainty for Anonymity in Moving Objects Databases. In Proceedings of the 24th International Conference on Data Engineering, ICDE 2008, pp. 376-385 (2008).    Non-Patent Document 2: A. Machanavajjhala, J. Gehrke, D. Kifer, M. Venkitasubramaniam. l-Diversity: Privacy Beyond k-Anonymity. ACM Transactions on Knowledge Discovery from Data, Vol. 1, Issue 1, Article No. 3, 2007.