Data anonymization is a data privacy technique in which personal information from data of a database is protected via deletion or encryption such that individuals about whom the information relates cannot be identified. Data anonymization may be used to protect the privacy of individuals or companies about whom data has been collected while at the same time maintaining the integrity of the released data that is being shared. Current techniques being used to anonymize data typically apply to numerical data or hierarchical data and cannot be applied to other types of data, such as textual data, thus limiting the anonymization options that are available. In order to protect from the disclosure of individual or sensitive information, data may be lost during the anonymization process. For this reason, users often will want to balance protecting individual or sensitive data with minimizing information loss.
In the drawings, like reference numbers generally indicate identical or similar elements. Additionally, generally, the left-most digit(s) of a reference number identifies the drawing in which the reference number first appears.