The present invention relates to computer processing systems, and more specifically, to a method for identifying combinations of attributes that contain unique records in a dataset.
The identification of attributes (columns) or sets of attributes that can be used to uniquely identify records is a key task in database management as well as in technical systems, in which records consist of a combination of attributes that could hold information about entities, events and others. For example, by combining different columns of a dataset comprising information about events in several nuclear facilities, it may be possible to uniquely identify one of the facilities, which would then allow an observer to group all events to this facility and to draw conclusions about the operations of that facility. Such combinations of attributes that contain values which uniquely point to records from the dataset are called quasi-identifiers (QIDs).
The task of finding unique records aims at discovering combinations of attributes that form QIDs. Such QIDs may be used as input for data anonymization algorithms.