1. Field of Invention
This invention relates to protecting privacy.
2. Description of Related Art
Conventional systems for protecting privacy determine quasi-identifiers in data sets useful in re-identifying individuals. These systems attempt to generalize, hide and/or withhold the quasi-identifying information to protect the privacy of subjects. Thus, in one conventional system for protecting privacy, salary information for a specific user is hidden within the aggregate salary of a group of subjects. Some conventional systems attempt to generalize information to prevent re-identification. Thus, the digits of a subject's unique identification number are transformed by replacing digits with wildcards. This generalization transform creates larger subject groupings which tend to lessen the impact of an information disclosure.
In “Protecting Privacy when Disclosing Information: k-Anonymity and Its Enforcement through Generalization and Suppression”, Technical Report, SRI International, March 1998, P. Samarati et al. describe a privacy system based on a theory of k-anonymity. Samarati et al. propose granular access to any sensitive information by applying the minimal set of generalization transformations necessary to create desired groups of size “k”. In this conventional privacy system, the subject identifying information is transformed by generalization transformations. Although useful, these conventional systems for protecting privacy have limited application in data mining applications since they rely on access to the underlying records of the data source. Due to privacy restrictions, third party access to the records of the underlying data source is generally restricted.
Modern data mining and data warehousing systems use complex algorithms to identify interesting patterns of information in the records of the underlying data source. The patterns of information frequently relate to groups of subjects. Knowledge about these interesting groups of subjects may be extracted or mined using agglomerative clustering; k-means clustering and/or various other knowledge extraction transformations. However, if the extracted knowledge is combined with information already available to the public, the combined information can in some instances be used to re-identify specific subjects within the groups.
Conventional k-anonymity and/or data generalization-based privacy systems are applied to the underlying records of the information source before the knowledge extraction and/or data mining transformations are performed. These conventional systems for protecting privacy are difficult to implement in a multi-party environment where only the clustered, agglomerated or extracted knowledge is available for sharing with a vendor. Moreover, these conventional systems for protecting privacy assume that all the relevant information is available simultaneously in a single location.
In practice however, information about subjects is likely to be dispersed among a number of data sources and only the extracted knowledge or data is sold or transferred. Thus, if the extracted knowledge or data is not properly protected or anonymized, third parties may combine the extracted knowledge with other information sources to re-identify subjects and compromise the subject's privacy.
Thus, systems and methods for protecting privacy of entities associated with the extracted knowledge or data are required.