A set of data owners have some sensitive data. Each owner's data is not enough for learning an accurate model. They want to share the data to a pool for modeling. But they cannot directly release the data due to privacy issues.
The preserving of the privacy of data that is to be shared to a pool for modeling is a general problem in the industry as it widely-appeared in many domains, esp., for example, health care; medical records; economics: business reports; homeland security: face images, fingerprints, etc.
Currently, while the providing of data anonymously is one way to share private data, there is no guarantee that anonymity can protect privacy.
For example, it has been demonstrated that even user's of the popular program NETFLIX® that obtain data anonymously, can possibly be identified users of the Netflix data by linking them to IMDB dataset. Moreover, more than 87% of American citizens can be uniquely identified by just observing their gender, ZIP code and birthdates. Basically, a person's privacy data is not as safe as they imagine.
As a further example of a prior art system 10 for sharing medical data, FIG. 1 shows a plurality of hospitals 12a, . . . , 12n that each provide respective medical data, e.g., medical records 15a, . . . , 15n for a few patients who have a particular disease, to a data center, employing a computing machine 19, so that the medical data can be analyzed for the disease. In such a scenario, a learner 18 is solicited to analyze the data. However, as it is the case that the medical records are sensitive and private it is desirable that the records' contents be blind to a learner. In a more strict sense, the medical records may not even be allowed to be released out of the hospital.
Thus, it is the case that a set of data owners have some sensitive data. Each owner's data is not enough for learning an accurate model, and it is advantageous to share the data to a pool for modeling. But they cannot directly release the data due to privacy issues.