In some situations, particular words and/or information, e.g., names, dates, phrases, etc., comprising a document may need to be redacted in order to prevent discovery of the particular words and/or information. In particular situations, sensitive and/or classified information may need to be redacted from a document or replaced by a pseudonym prior to release of the document to the public, for example.
Digital signatures or time-stamps are usually used to protect the integrity of documents. But these algorithms do not work on redacted documents, because any change to the document leads to an invalid signature or time-stamp certificate. Conventional signature schemes only allow one to verify the message that is signed by a signer. If the message is modified, for example replacing a subdocument with a pseudonym, the signature is no longer valid.
In existing digital redaction schemes, redacted data elements are removed and replaced with a null character or a black rectangle. In certain applications, a pseudonym would be useful to prevent the disclosure of actual data elements while retaining the context and structure of the document. For example, instead of replacing “Alice” with a null character, a redactor can use “Pseudonym:Paula”. If a value (e.g., “Alice”) occurs several times in the document or database, the value should always be replaced by the same pseudonym. The reader therefore is able to make connections between occurrences of the pseudonyms, which retains the structure and improves the readability of the redacted document.
A prior approach to pseudonymization is to have the data owner sign (or compute a time-stamp certificate for) each possible pseudonymized document. However, this solution requires the data owner to either be available for signing all the time, or to pre-compute and store an impractical number of documents. Instead, a desirable performance requirement for any new technique for this problem is that the data owner should only need to sign the document once, or a (small) constant number of times, regardless of the size of the entire document.