This section introduces aspects that may be helpful to facilitating a better understanding of the inventions. Accordingly, the statements of this section are to be read in this light and are not to be understood as admissions about what is in the prior art or what is not in the prior art.
It is often desirable to transform, pre-process and store a stream of sensitive data so that the transformed data can be analyzed without compromising the privacy of the data of any individual. Each data item in the streamed data typically comprises a first element identifying an individual, such as a name or an address, and a second element containing some private and/or sensitive information about the individual, such as a disease that the individual has. The identifying part of the data should be transformed so that the processed stream can be saved for later analysis in a manner that allows the data to be analyzed while maintaining the privacy of the individuals. Generally, researchers and/or analysts viewing the transformed data and associated sensitive data should be able to analyze the data and make reasonable (though approximate) conclusions about the data without being able to identify the sensitive information of any particular individual. For example, researchers may wish to study diseases in a particular neighborhood.
Data anonymization techniques can address the privacy concerns and aid compliance with applicable legal requirements. A number of data anonymization techniques have been proposed or suggested that achieve various privacy goals by ensuring that the transformed data has certain properties. For example, k-anonymity techniques require that each individual in the data set must be indistinguishable from k−1 other individuals. In addition, l-diversity techniques provide sufficient diversity in the sensitive information associated with individuals.
A need remains for improved techniques for effectively anonymizing data so that portions of the data can be published and shared with others.