This section introduces aspects that may be helpful to facilitating a better understanding of the inventions. Accordingly, the statements of this section are to be read in this light and are not to be understood as admissions about what is in the prior art or what is not in the prior art.
It is often desirable to transform and store a stream of sensitive data so that the transformed data can be analyzed without compromising the privacy of the data of any individual. Each data item in the streamed data typically comprises a first element identifying an individual, such as a name or an address, and a second element containing some private and/or sensitive information about the individual, such as a disease of the individual. The identifying part of the data should be transformed so that the processed stream can be saved for later analysis in a manner that allows the data to be analyzed while maintaining the privacy of the individuals. Generally, researchers and/or analysts viewing the transformed data and associated sensitive data should be able to analyze the data and make reasonable (though approximate) conclusions about the data without being able to identify the sensitive information of any particular individual. For example, researchers may wish to study diseases in a particular neighborhood or region.
Data anonymization techniques can address the privacy concerns and aid compliance with applicable legal requirements. A number of data anonymization techniques have been proposed or suggested that achieve various privacy goals by ensuring that the transformed data has certain properties. For example, k-anonymity techniques require that each individual in the data set must be indistinguishable from k-1other individuals. As used herein, k is referred to as an anonymity parameter. In addition, l-diversity techniques provide sufficient diversity in the sensitive information associated with individuals. U.S. patent application Ser. No. 14/225,720, filed Mar. 26, 2014, entitled “Anonymization of Streaming Data,” now U.S. Pat. No. 9,361,480, incorporated herein by reference, provides methods and apparatus for anonymizing data in a data stream.