This invention relates generally to data privacy and, more specifically, relates to enforcing data privacy to maintain obfuscation of certain data.
In the information-security field, it is widely regarded that there are dimensions of information security: (1) Integrity, which means that valuable information should not be damaged by any computation and that security-sensitive computations should not be influenced by untrusted values; and (2) Confidentiality, which means that it is ensured that information is accessible only to those authorized to have access.
A typical information-flow-security model is as follows: Data and principals are associated with labels that determine which principals have access to which data. These labels form a lattice, in which elements are partially ordered. In this partial order, the terms “high” and “low” take different meanings depending on the security dimension they represent. Specifically, in integrity, “high” means “unstrusted” and “low” “trusted”; in confidentiality, “high” means “secret” and “Low” “public”.
In terms of information flow, the rule is that there should be no flow of information from high to low. However, an information policy can “downgrade” data and principals. High-to-low information flow is allowed as long as the high information has been sufficiently downgraded. In integrity, “downgrading” means “endorsing”; in confidentiality, it means “declassifying”.
Thus, certain parts of secret information can be declassified and revealed to certain public listeners. For instance, the last 4 digits of a social security number (SSN) can be revealed to a bank teller. Another example is the following. Assume that the passwords of a set of users are secretly stored in a file. When a user attempts to log in by entering a user ID and password pair, the authentication system has to check that the password entered is correct. To do so, it has to verify that the password stored in the set of passwords for that particular user ID matches the password that was entered. If a match is found, then the authentication system returns a success message to the user and allows the user to log in. If no match is found, the authentication system returns an error message to the user. Regardless of whether the message returned is a success message or an error message, the message itself must be considered “high” because it is obtained as a result of comparing the password that was entered by the user with the passwords in the password file. Therefore, displaying that message to the user should be considered an information-flow-security violation. Theoretically, a user could use the information in the message to try different passwords until the correct one is identified (a “dictionary attack”). According to the information-flow-security principle, then, an authentication mechanism violates information-flow security. However, assuming that the authentication system allows only a limited number of attempts (for example, three) before locking the account, the message returned to the user can be considered declassified. Furthermore, certain parts of untrusted input can be endorsed and used in certain trusted computations. For instance, untrusted user input can be used in a Web application if the input is properly formatted and is verified to not contain substrings that could compromise the integrity of the system.
Private data stored, e.g., in databases is therefore sometimes declassified and then released. An issue with release is whether, even after declassification and/or endorsement, the release still maintains privacy for those people whose data is in the databases. This is particularly true when information from releases from multiple databases can be combined by an entity (e.g., an attacker) to determine private information about a person who has data in both databases.