In recent years, the service utilizing the position data measured by GPS (Global Positioning System) mounted on a mobile terminal, a car and so on and wireless LAN (Local Area Network) and so on is increasing. There is a possibility that the position data is data which specifies an individual (of a user of the mobile terminal, the car and so on) at a home, in a working place, at a school and so on, and allows contact with the user, or data which the user does not want to be known by a stranger such as hobbies and diversions, hospital visit, and so on, and the position data is high-level privacy data. Such privacy data is anonymized, to secure the anonymity.
Here, the anonymization is processing to process the privacy data so as for the user not to be able to be specified. An index showing what degree a user cannot be specified is called an anonymity index. The existing index as the anonymity index is k-anonymity or 1-diversity. Hereinafter, it is supposed that the user data is composed of an anonymous index as the one which is composed of one or more quasi-identifiers which identify the users and one or more sensitive data. First, the k-anonymity is an index to guarantee that a same quasi-identifier becomes k or more quasi-identifiers through anonymization of the quasi-identifier. By satisfying k-anonymity, a user is not specified to one. On the other hand, the 1-diversity is an index to guarantee that sensitive data of the same quasi-identifier becomes one or more quasi-identifiers through with the anonymization of the quasi-identifier. It is possible to prevent that the sensitive data of the user is known, by satisfying the 1-diversity.
For example, it is supposed that there is the disease condition record of a patient shown in FIG. 1A as an example. In the disease condition record shown in FIG. 1A, ZIP code, an age, and a nationality are recorded as the quasi-identifier, and a disease condition is recorded as the sensitive data. In case of the ZIP code and the age, the anonymization is carried out by turning down an optional field, and in case of the nationality, the anonymization is carried out by turning down a name of the country. FIG. 1B shows an example when a disease condition record shown in FIG. 1A is anonymized. By anonymizing the ZIP code, the age, the nationality, two groups are formed to have an identical quasi-identifier. “k” of the k-anonymous is the number of uses of the group, and k=4 in this example in any case. It becomes not possible to specify a user corresponding to a line by guaranteeing k·2. “l” of the 1-anonymous is the number of disease conditions of the group, and l=2 in the group to which the users 1 to 4 belongs, and l=1 in the group to which the users 5 to 8 belongs. When a viewer that a user of 30 eras and having the ZIP code is 148** visits a hospital sees this table (FIG. 1B), it is known to the viewer that the user is “a cancer”. However, even if the viewer knows the user, it is possible to prevent that the viewer further know the feature of the user (disease condition in this example), by guaranteeing 1.2 (the users 1 to 4). As the well-known anonymity index, there are t-approximation and m-invariance but their descriptions are omitted.
The position data periodically measured by a mobile terminal, a car and so on depends on the data, in nature. For example, there is a possibility that the position data is data by which an individual at a home or in a working place is specified, and there is a possibility that the position data is data which clarifies the feature of the individual, such as a place specifying the hobby and diversion and at a hospital to be visited. However, it is difficult to know such a nature in single position data, but there are many cases that the nature can be clarified, by analyzing a plurality of position data of a same user, and by inspecting a place which the user stays for a long time every day. Therefore, each position data of a position history (a plurality of position data of the same user) is a quasi-identifier, and sensitive data.
Regarding the anonymization of the position data, the single position data can be data by which a viewer who is in the “place” can specify a user. When the position data of the user is viewed after the user has been specified, where the user goes is known. Therefore, it is necessary to prevent the user from being specified by guaranteeing the k-anonymity in case of the single position data. FIG. 2 is a conceptual diagram showing an example of the anonymization of the single position data. Here, an example of anonymizing (abstracting) the position data of the user 1 to the user 4 to meet the k-anonymity (k·4) is shown. In FIG. 2, a black point is position data of each user shown with latitude and longitude and a gray circle shows an area. It is possible to make it difficult for the viewer who was in the “place” to specify all the users, by converting the position data of each user into the area data in which four users are contained.
In relation to the above technique, Patent Literatures 1 and 2 show examples of a system of anonymizing data as the technique which uses privacy data to service while securing the anonymity of the privacy data.
The privacy data management server of Patent Literature 1 (JP 2005-234866A) manages the privacy data of a terminal user in a network connecting between a plurality of terminals for communication. The privacy data management server is provided with a privacy data database which stores the privacy data of the user, a privacy data management section of managing the privacy data in the privacy data database, and a statistic processing section of calculating a rate of the users who are specified from a kind of the privacy data, to a total of users registered on in the said privacy data database. When receiving a request message of the privacy data of the user from a terminal, the privacy data management section searches the privacy data database. The statistic processing section calculates a rate of the users who have the privacy data to the registered users, when the privacy data is searched, and transmits the privacy data to the terminal when the searched privacy data is more than a threshold value.
Also, a data disclosure apparatus disclosed in Patent Literature 2 (JP 2007-219636A) manages data containing privacy data. The data disclosure apparatus is provided with a retaining section of retaining one or more data, each of which is composed of one or more attributes, an anonymity calculation section of calculating the anonymity when disclosing the attribute of a characteristic of the data, and a grain size change disclosure section which changes a grain size of the data of a specific attribute such that the data has the anonymity higher than a desired threshold value, when the calculated anonymity has not a desired anonymity, and discloses the data of the attribute.
Also, as a related technique, in a method of using presence data disclosed in Patent Literature 3 (JP 2005-031965A), a data user side terminal apparatus uses presence data disclosed by a data provider side terminal apparatus through a communication network under a service control by a server apparatus. In this method of using the presence data, the data user side terminal apparatus executes the following steps: a data collection request step of requesting collection of presence data to a server apparatus; a presence data reception step of transmitting advertisement and guidance of a data provider recruiting to the data provider side terminal apparatus by the server apparatus, carrying out application reception and contract in cooperation with the data provider side terminal apparatus, and receiving the presence data generated from the contents of the contraction and presence object data from the data provider side terminal apparatus; a statistic processing or presence data storage step of carrying out statistic processing or accumulation of the generated presence data; and a charge data storage step of storing charge data to the statistically processed presence data to support a disbursement with reward.
Also, a data service system disclosed in Patent Literature 4 (JP 2004-029940A) is provided with a first data processing apparatus connected with a network to manage data; a second data processing apparatus which provides the data for the first data processing apparatus; and a third data processing apparatus which acquires the data from the first data processing apparatus. In this data service system, the first data processing apparatus is provided with a neighborhood data acquisition section of acquiring circumference data of the neighborhood of the second data processing apparatus which data is supplied from the second data processing apparatus; an statistic data generation section of generating statistic data from the neighborhood data acquired by the neighborhood data acquisition section; a request receiving section of receiving a request of the neighborhood data from the third data processing apparatus; and a neighborhood data supplying section of supplying the neighborhood data generated by the neighborhood data generation section to the third data processing apparatus based on the request received by the request receiving section. The second data processing apparatus is provided with a neighborhood data collection section of collecting the neighborhood data; a neighborhood data supplying section of supplying the neighborhood data collected by the neighborhood data collection section to the first data processing apparatus; and a supply control section which controls the supply of the neighborhood data by the neighborhood data supplying section. The third data processing apparatus is provided with a neighborhood data request section of requesting the neighborhood data and a neighborhood data acquisition section of acquiring the neighborhood data requested by the neighborhood data request section.
Also, a data service apparatus according to Patent Literature 5 (JP 2004-318391A) is communicable with an access apparatus through a network and provides data to the access apparatus based on a request from the access apparatus. The data service apparatus is provided with an individual data storage section which stores individual data of an individual, a receiving section which receives an individual data transmission request containing a searches condition of the individual data from the access apparatus; a search condition confirmation section which confirms the search condition contained in the individual data transmission request received by the receiving section, deletes a condition possible to specify the individual when the condition possible to specify the individual is contained in the search condition contained in the individual data transmission request, outputs the search condition in which the condition possible to specify the individual is deleted as a search condition after the confirmation, and outputs the search condition contained in the individual data transmission request just as it is as the search condition after the confirmation, When the condition possible to specify the individual is not contained in the search condition contained in the individual data transmission request; an individual data extracting section which inputs the search condition after the confirmation outputted from the search condition confirmation section and searches the individual data storage section based on the search condition after the confirmation to extract individual data; a data ID generating section which generates a data identification to identify the extracted individual data by using the individual data extracted by the individual data extracting section based on a predetermined rule, and assigns the generated data identification to the extracted individual data; and a search result determination section which determines whether or not it is possible to specify an individual from the individual data assigned with the data ID and extracted by the individual data extracting section based on the predetermined rule, and transmits the individual data assigned with the data ID to the access apparatus when determining that it is impossible to specify the individual.
Also, a data mediation apparatus disclosed in Patent Literature 6 (JP 2005-346248A) is provided with a first data storage section which stores individual specifying data specifying an individual and containing a mail address and diagnosis result data of the individual; an anonymization section which refers to the first data storage section to exclude predetermined data containing a name from the individual specifying data, and stores the remaining individual specifying data and at least a part of the diagnosis result data in a second data storage section as anonymization individual data; a section which allows access to the anonymization individual data stored in the second data storage section from a terminal of a registered provider; a section which identifies the individuals belonging to each of a plurality of classifications prescribed based on data classification in the anonymization individual data stored in the second data storage section by using the data stored in the first data storage section and stores the identification data of the individual belonging to each of the plurality of classifications in a third data storage section; and a section which receives an advertisement mail to each of the plurality of classifications contained in the third data storage section from the registration undertaker, and transfers the advertisement mail to the mail address stored in the first data storage section by using the identification data of a belonging individual stored in the third data storage section.
Also, an anonymization identification data generating system disclosed in Patent Literature 7 (JP 2007-179500A) is provided with a data acquisition section which acquires subject identification data peculiar to every subject of an object for genetic data to be analyzed, and subject relation data showing relation among subjects; an identification data coding section which codes the subject identification data acquired by the data acquisition section and generates coded identification data; a coding data generation section which generates coding data based on the coding identification data generated by the identification data coding section and the subject relation data acquired by the data acquisition section; and a coding data transmission section which transmits to another apparatus for analysis.