The ubiquity of data-sharing in databases such as social networks raises privacy concerns for participants. Simply hiding the identities of individuals prior to making the data publicly available cannot guarantee privacy, or at least anonymity, because the richness and quantity of data can allow individual identities to be estimated. The data includes not only properties of individuals but their connections to each other which is often also included in such databases.
In many situations, individuals wish to share their personal data for a variety of reasons, including social networking or for machine learning applications such as recommenders and exploration purposes. If the data contains tacitly identifying information, it can be necessary to protect it with privacy or at least anonymity guarantees while maintaining forms of utility of the data (i.e., ability to utilize data for a desired function). Privacy can be guaranteed by hiding or generalizing data. There are various schemes that have been studied and which are used. These include k-anonymity, l-diversity, t-closeness, and differential privacy, where differential privacy over requires specifying the data application (e.g., logistic regression) in advance.