Data sharing is common in scientific research as well as in business. A telecommunication service provider (TSP) may obtain user or subscriber information as set out in Tables 1a to 1b of FIG. 1.
The telecommunication service provider may wish to share this information with their business partner who may wish to assess whether a certain location (12, 24) is desirable for setting up a cosmetic shop. Accordingly, the telecommunication service provider may provide Tables 1a and 1b to the partner who then checks for human traffic patterns around that location, especially of their target clientele (e.g. young women). However, under the US standards according to the National Institute of Standards and Technology, a mobile number is considered Personally Identifiable Information (PII). PII refers to information that can be used on its own or with other information to identify, contact, or locate a single person, or to identify an individual in context. PII can be regarded as any information about an individual maintained by an agency, including (1) any information that can be used to distinguish or trace an individual's identity, such as name, social security number, date and place of birth, mother's maiden name, or biometric records; and (2) any other information that is linked or linkable to an individual, such as medical, educational, financial, and employment information.
Accordingly, directly sharing data from Tables 1a and 1b may compromise the privacy of the listed users, and thus may result in an offense. Deleting the mobile numbers from Tables 1a and 1b would resolve the privacy issue but does not provide useful data since the mobile numbers are used as the primary (linking) key to link the data in these two tables. Accordingly, obfuscation of data is needed.
If the obfuscated data shared with different partners are the same, then there exists a risk of collusion attack. FIG. 2 shows Tables 2a to 2c which are respectively shared with different partners. Since the encrypted mobile numbers provided to the different partners are the same, the partners may collude to re-construct an almost complete portfolio for every subscriber, which would then infringe upon user privacy.
Existing methods relating to data obfuscation are described in US patent application publication numbers US 2008/0181396 A1 (Balakrishnan et al.) “Data Obfuscation of Text Data Using Entity Detection and Replacement” and US 2012/0303616 A1 (Abuelsaad et al.) “Data Perturbation and Anonymization Using One Way Hash”. While existing solutions could be efficient for single use, it would not be efficient if there are millions or even billions of rows of data to be obfuscated. In the present era, high volumes of data are generated at a high velocity and, accordingly, a more efficient method of obfuscation is desirable.