Based on purposes of academic researches or of commerce actions, techniques of data mining have been widely applied to fields of medical record analysis and consumer behavior analysis. Generally, before the procedures of data mining of a dataset, for preserving privacy of the dataset being manipulated, a process to anonymize the field of names of individuals contained in the dataset is performed. However, by means of comparing the anonymized dataset with related dataset, as demonstrated in FIG. 1, chances are the privacy of one or more individuals is exposed, which leads to the issue of invasion of privacy.
Conventional methods used to deal with the issue of invasion of privacy described above need intervention of humans, with knowledge about the purposes and the manners of the procedures of data mining and the further analysis on the dataset, to determine relative and irrelative fields respectively and to keep the relative fields while masking the irrelative fields in respect of the subsequent process of data mining and analysis. In practice, however, it is almost impossible to know well in advance about the purposes and the manners being performed on the dataset, thus conventional methods need improvement in this regard.
There are some related literatures and technologies for cross-network authentication. For example, a literature provides a system and a method for automated determination of quasi-identifiers for sensitive data fields in a data set, which is incorporated herein by reference herewith. However, Agrawal et al. do not provide a method for masking one or more fields of quasi-identifiers.
Some literatures also introduce that the robustness of preservation of privacy can be determined according to the k-anonymity or l-diversity, or both, of the dataset. One way to increase the robustness of preservation of privacy of a dataset is to mask as many fields as possible. The more fields are masked, however, the less accurate the dataset becomes, and the less data utility as a result.
Conventional methods and prior arts mentioned above do not provide a flexible manner in respect of users' needs to preserve the privacy of a dataset appropriately while keeping the dataset accurate.