The present invention relates to an information protection device and an information protection method.
In recent years, the use of position information including positioning data acquired by a GPS (Global Positioning System) or wireless LAN (Local Area Network) installed in a portable terminal or an automobile has increased for a variety of applications. Furthermore, the number of services for periodically acquiring position information and recording the movement trajectory or action history of the users of portable terminals or automobiles has been growing.
In addition to the information that can identify the user, such as one's home, place of employment, and school, the position information may include information that the user does not want to be known to others, such as a hobby or hospital. Therefore, the position information has a high degree of privacy. Further, the movement trajectory, which is the time-series information of position information, may also represent the route to a sensitive location with a high degree of individual privacy and the presence or absence at such a location, and can identify the user at a level higher than independent position information. In addition, where the movement trajectory is used in a real-time fashion by a service provider or data analyst, the user is always under a threat of being tracked and monitored. Therefore, the movement trajectory is privacy information with a very high degree of privacy. Accordingly, when such privacy information is provided to the service provider or data analyst, anonymity should be ensured by anonymization.
The anonymization, as referred to herein, is the processing by which privacy information is processed so the user could not be identified. A metric indicating the degree to which the user cannot be identified is called anonymity metric. Thus, k-anonymity is well known as the anonymity metric which is presently used. Information which is included in the privacy information and is not an identifier that can uniquely identify the user, but can identify the user when background information or the like is taken into account, is called quasi-identifier (indirect identifier). Further, information that the user does not want to be known to others is called sensitive information. The k-anonymity is the metric that indicates the presence of k or more types of sensitive information having the same quasi-identifier by anonymization of the quasi-identifier. By ensuring k-anonymity, it is possible to reduce the possibility of identifying the user to 1/k or less, thereby making it difficult to identify the user.
A patient's medical record such as shown in FIG. 13 will be considered below by way of example. The medical record shown in FIG. 13 includes the name, gender, occupation, and age of the patient together with the medical condition. The medical condition is sensitive information with a high degree of privacy. The name is an identifier uniquely identifying the person, and the gender, occupation, and age are quasi-identifiers that can identify the person. For example, even if the patient's name is unknown, the patient can be estimated on the basis of a combination of occupation and age. In other words, even in a state in which the patient's name is hidden, a person knowing the patient's occupation and age can know the medical condition of the patient. In order to avoid this, the level of abstraction of occupation and age can be increased, as shown in FIG. 14. FIG. 14 presents anonymous information obtained by anonymization of the medical record shown in FIG. 13, such as to ensure the k anonymity of k=2. In the anonymous information shown in FIG. 14, even if the occupation and age are known, two or more medical conditions are present for all combinations of occupation and age. As a result, even a person knowing the occupation and age of a patient, cannot accurately estimate the patient's medical condition. Further, l-diversity, t-closeness, and m-invariance are known in addition to k-anonymity as metrics of anonymity.
Non-Patent Document 1: O. Abul, F. Bonchi, M. Nanni “Never Walk Alone: Uncertainty for Anonymity in Moving Objects Databases”, Proceedings of the 24th International Conference on Data Engineering, IEEE, April 2008, p. 376-385.
However, position information and movement trajectory are different in nature from the above-described information. Positioning data of position information point to a specific location. In some cases, the positioning data indicate a location with a high degree of privacy for everyone, but privacy of specific positioning data generally differs depending on the user linked thereto. Even if the positioning data indicate the location, such as a hospital or home, the meaning of the location is difficult to estimate by merely linking the user and the positioning data. Meanwhile, information such as the staying time duration and the length thereof can be obtained from continuous positioning data such as a movement trajectory. From such information, it is possible to estimate the meaning of the location indicated by the positioning data. Then, it is possible to identify the user on the basis of the estimated information. Therefore, the positioning data constituting the movement trajectory can be considered as a quasi-identifier and also as sensitive information.
FIG. 15 shows the information obtained by subjecting position information to k-anonymization. Each point represents positioning data of position information, and the ellipse is the anonymous information obtained by anonymization of position information by abstraction of positioning data of position information enclosed in the ellipse. In FIG. 15, k-anonymity with k=4 is ensured and accurate position of the user is unknown.
In addition to the above-described specific feature of position information, the movement trajectory has a specific feature of representing the actual conditions of the user's life. By linking the user with the movement trajectory, it is possible to reveal all destinations and staying locations including the location with a high degree of privacy for the user. There is also a risk of the user's movement being monitored or tracked. Therefore, the privacy of movement trajectory is much higher than that of the independent position information or a combination of a plurality of types of simple position information. Furthermore, in some cases the user can be identified only by revealing several types of positioning data of position information in the movement trajectory. For example, information such own home, place of employment, and the nearest station are often known to friends and colleagues. Therefore, where the positioning data on those locations are included in the movement trajectory, the user linked to the movement trajectory can be identified and privacy information on other locations such as hospital and locations clearly indicating preferences can be revealed.
Accordingly, the anonymization of the movement trajectory is performed to make it difficult to link the user with the movement trajectory. In order to ensure k-anonymity, it is necessary to generate the movement trajectory that is anonymized so as to include k or more movement trajectories. FIG. 16 shows an example of movement trajectory anonymization. In FIG. 16, a tubular movement trajectory is generated such that includes four movement trajectories. The information in which the correspondence of the movement trajectory and the user is thus obscured is called anonymous information. The objective movement trajectory is included in the anonymous information in the entire zone from the start point to the end point of the movement trajectory. Therefore, position information within a specific period of time in the anonymous information also satisfies the k-anonymity at all times. In other words, the anonymous information shown in FIG. 16 satisfies the k-anonymity (k=4).
Non-Patent Document 1 discloses an example of anonymization technique using anonymous information. Non-Patent Document 1 describes the technique for anonymization of accumulated movement trajectories and suggests the anonymization technique which uses the anonymity metric of (k, δ)-anonymity and by which static movement trajectories accumulated in the database are generalized into a tubular shape. With such a technique, the anonymization is performed by grouping and abstracting data with clear start points and end points of movement trajectories for which distances between the movement trajectories are close to each other. Ensuring the k-anonymity with respect to static movement trajectories means ensuring that k or more movement trajectories consistently present between the start point and the end point are included in the same group (anonymous information). As a result, even if there is one known staying point in the user's movement trajectory, the movement trajectory itself cannot be identified. The assembly of movement trajectories is abstracted into a tubular shape in a three-dimensional space of latitude, longitude, and time at which they are measured, and the abstracted assembly is outputted as anonymous information.
The anonymization technique shown in FIG. 16 is effective for anonymizing movement trajectories within a predetermined period, but is not necessarily effective for real-time anonymization of movement trajectories such that position information is added moment by moment. In such an environment, the movement trajectory extends (enlarges) on a time axis as the unknown position information arrives periodically. Therefore, in order to use the movement trajectory in a real-time fashion, while ensuring anonymity, it is necessary to perform the real-time anonymization with respect to the increments of the movement trajectory. When such anonymization is performed, the anonymization with respect to the increments should be performed after the result of the already performed anonymization has been taken into account. This is done to make it difficult to establish the correspondence between the user and the movement trajectory that depends on a combination of a plurality of pieces of position information, in the same manner as in the anonymization of static movement trajectories. With the anonymization method illustrated by FIG. 16, the anonymization is performed with respect to accumulated movement trajectories, and no anonymization with respect to the increments of the movement trajectory is assumed. In particular, the anonymization method illustrated by FIG. 16 is based on a macroscopic approach according to which the anonymization is performed once so as to satisfy the anonymity requirement in the entire path between the start point and the end point, and is not intended for processing the local position information such as increments of movement trajectory.
Therefore, where the anonymization method illustrated by FIG. 16 is applied to newly arrived increments, it is assumed that the anonymization is performed with respect to a short path for which the increment data and data preceding thereto are captured as the end point and the start point, respectively. In such a case, in order to ensure k-anonymity, it is necessary to configure the anonymous information by using the previously anonymized anonymous information and at least k or more identical movement trajectories.
However, every user acts and moves in his or her individual manner. Therefore, it is unlikely that the users with adjacent movement trajectories at some point in the past will continue following the adjacent trajectories in the same manner in the future. Therefore, it can be assumed that in the anonymization method illustrated by FIG. 16, the geographical similarity between movement trajectories forming anonymous information decreases and the degree of abstraction of anonymous information gradually increases with the passage of time. In other words, where the anonymous information is configured of the same combination of movement trajectories at all times, the positioning data of position information included in the movement trajectories can become too abstract and meaningless information can be obtained. Furthermore, the anonymity cannot be guaranteed in the case where the anonymization is performed independently with respect to movement trajectories in different time intervals, without taking into consideration the movement trajectories that have been anonymized in the past.