The disclosed technology relates to the fields of cryptography and document processing.
There are a number of commercial products for supporting legal discovery. Some products use natural language processing to cluster or categorize and detect cumulative or duplicate documents. These products identify entities within the document. In some products a user then manually selects what entities are to be redacted from the document. Other products can use rules to help redact identified entities and other personal or sensitive information. While these products reduce the time required to produce documents, they still require that the data gatekeeper process the documents to redact sensitive information for which the requesting entity is not authorized. However these tools still require that the data gatekeeper process the documents that contain sensitive information for each discovery request.
Content processing technologies exist to facilitate content indexing and duplicate identification. Technology also exists to redact, or remove, content from documents. The goal of these technologies is to index content, facilitate content search and thus to facilitate removing the searched-for content from the documents.
The existing technology does not allow “in-document” redaction. Either a paper copy or an image of a paper copy is provided that has the sensitive information blocked out. Electronic documents can be redacted by deleting the sensitive information from the file. One of the problems that result from this situation is that because multiple parties have different access rights and because the access rights of the parties change over time, the document owner must carefully control what is redacted based on the access rights. Due to the sheer manual labor and bookkeeping issues involved, mistakes are made. What is needed is some way for documents that contain sensitive information to be provided only once and to have a simple but secure method to reveal the content of the document based on the access rights given to the party.
Another problem that needs to be addressed is that of mistakenly delivering a partially redacted document to the wrong party (such as by a mistake by the post office, or a mailroom error, etc.). Yet another problem is that of attempting to determine which documents in a document collection, or portions of a document, have specific sensitive information.
It would be advantageous to provide a technology that would allow reversible redaction of electronic documents.
In accordance with the disclosure herein, a computer controlled method, apparatus and computer program product therefor, revealing sensitive information in a selectively encrypted data unit comprising: accessing the selectively encrypted data unit, comprising an encrypted version of the sensitive information, a plurality of auxiliary values, and an attribute vector associated with the encrypted version, the encrypted version capable of being decrypted into the sensitive information; accessing a unique capability key, the unique capability key associated with a key descriptor, the unique capability key responsive to one or more cryptosystem parameters, one or more random numbers and one or more shares; determining whether the attribute vector is filtered by the key descriptor; acquiring, responsive to determining, a protection key responsive to the one or more cryptosystem parameters, the plurality of auxiliary values, the key descriptor and the unique capability key; decrypting the encrypted version with the protection key to generate the sensitive information; and presenting the sensitive information.