1. Technical Field
Present invention embodiments relate to archiving content, and more specifically, to accessing protected content (e.g., password protection, encryption or encoding, access restrictions, etc.) in order to archive that content.
2. Discussion of the Related Art
Compliance archiving is the process of capturing corporate data, such as electronic mail (e-mail) messages or files, from a shared file system or user workstation. This data may later serve as evidence in legal cases. A full-text search index is typically created over the archived data to enable content searches to be performed based on keywords. The content searches require that text content of the data be accessible by an archive system.
In the case of compliance archiving for e-mail messages, a mechanism referred to as e-mail journaling is frequently used in which e-mail systems of the corporation create a copy of every e-mail message that is sent or received across a corporate network in order to archive these messages. However, the e-mail message sent by a user may be encrypted with the sender's and/or the recipient's credentials. Thus, no systems other than the e-mail system have appropriate credentials to decrypt the e-mail contents. Accordingly, the journal copy of the e-mail message is encrypted and cannot be used for compliance content searches since the contents of an encrypted e-mail message cannot be parsed and added to the full text index, nor can such an e-mail message be restored and read by anybody other than the designated recipients or sender. If the recipients or sender are no longer employees of the corporation, it may be impossible to access the content of the encrypted e-mail message.
Although a policy may be provided in which a corporate user account is always given appropriate access credentials to every e-mail message that is encrypted by a corporate e-mail system, implementation of such policies is difficult and may not always be possible.
In the case of performing archiving for legal compliance and discovery, content of encrypted documents cannot be text indexed for searching purposes and, therefore, this content cannot be searched and found easily. These documents can be of various kinds of document types that have been encrypted by the document originator or owner (e.g., text documents, spread sheets, archive files such as .zip files, etc.). In most cases, the only way to get access to content of encrypted documents is by requesting the originator or owner of the document to provide a decrypted version or a decryption key.
In a compliance archiving scenario, a significant compliance risk occurs when asking a user for a decrypted version of a document. This results from the possibility that the content of the provided decrypted document does not reflect the content of the original encrypted version. In order to verify the content of the decrypted document, as digital signature over the original content would have to be generated before that content was encrypted. However, the generation of the digital signature does not occur for most applications and cannot be generated once the content has been encrypted.
Although automatic or manual decryption of a document may be employed, the automatic decryption is limited to specific kinds of encryption and applications, while manual decryption (e.g., requesting users to manually remove encryption) cannot guarantee integrity of the decrypted document.