A system is said to provide differential privacy if the presence or absence of a particular record or value cannot be determined based on an output of the system, or can only be determined with a very low probability. For example, in the case of medical data, a system may be provided that outputs answers to queries supplied such as the number of users with diabetes. While the output of such a system may be anonymous in that it does not reveal the identity of the patients associated with the data, a curious user may attempt to make inferences about the presence or absence of patents by varying the queries made to the system and observing the changes in output. For example, a user may have preexisting knowledge about a rare condition associated with a patient and may infer other information about the patent by restricting queries to users having the condition. Such a system may not provide differential privacy because the presence or absence of a patient in the medical data (i.e., a record) may be inferred from the answers returned to the queries (i.e., output).
Typically, systems provide differential privacy by introducing some amount of error to the data or to the results of operations or queries performed on the data. For example, noise may be added to each query using a distribution such as a Laplacian distribution. However, while such methods are effective, they may add more error than is necessary to provide differential privacy protection.