The present application relates generally to an improved data processing apparatus and method, and more specifically to mechanisms for utility-aware anonymization of sequential and location datasets.
Data anonymization algorithms are becoming increasingly important to support modern business' needs for data sharing and data monetization. Due to worldwide privacy regulations governing different types of person-specific data, such as patient data in electronic health records, user mobility data in telco datasets, or the like, such data has to be anonymized before shared with third parties. Telco data anonymization is an important research area, as user location information is largely collected by telco operators, exposing the precise locations and corresponding times those individuals visited them. Such data poses a severe threat to privacy; yet, when anonymized, telco data is useful in supporting many applications, such as urban planning, infrastructure allocation, or the like.
Existing privacy solutions for location data either anonymize entire user trajectories or simplify the problem to that of anonymizing sequences of points of interest (POIs) visited by individuals represented in the data set, thereby discarding important temporal information from the data. These existing solutions lead to significant data distortion as they tend to overprotect the mobility data, either by concealing entire user trajectories or by protecting all m combinations of POIs visited by individuals and removing any associated temporal information. Existing solutions for concealing entire user trajectories falsify the data as they either enforce space/time translation to “move” trajectories close to each other prior to anonymizing them, or introduce synthetic data to conceal the real user trajectories. Such solutions require extensive parameterization from the data owner, such as to set Quasi-identifiers (QIDs) either for the entire dataset or on a per-user basis, to set the value of m for protecting user m sequences, to provide taxonomies of locations, to define sensitive locations, or the like.