With the development of data digitalization, the privacy protection of digital data begins to attract a lot of attention, especially sensitive personal information of personal lifestyle, e.g. habit, interest or occupation, as well as personal medical and health care information such as medical history or medication information. Such sensitive personal information can be easily re-identified, so that personal rights and interests will be seriously affected once the information is re-identified or leaked out.
In the past, in order to solve the problem of privacy protection of digital data, the random modification of data, the adding of fake data, the data perturbation or the data suppression have been used for generating anonymous data. However, in such conventional methods, the authenticity and creditability of the digital data are reduced because the data is modified randomly or fake data is added; or, the digital data is excessively distorted because a part of the digital data is modified and deleted not according to the authentic data. Therefore, usability and privacy of data cannot be achieved at the same time through the conventional methods.
After the anonymous data is generated, the digital data administrator may want to perform a risk evaluation for the re-identification of the anonymous data. In conventional risk evaluation method, all pieces of the original data are used for performing the re-identification. However, such risk evaluation method is very inefficient because of the long and needless evaluating computation for the repetitions of the original data.