Data backup and recovery systems often implement various techniques to efficiently maintain, store, and update data including various operations to delete data. A regular file delete operation makes the file inaccessible via the namespace and frees the underlying data blocks for later reuse, but does not typically render such data blocks unrecoverable. For example, the regular delete operation typically leaves behind a residual representation of the file. Accordingly, in order to permanently erase the data blocks and any remnants, the system must undertake a sanitization process. For example, data sanitization generally refers to the process of deliberately, permanently, and irreversibly removing or destroying the data stored on a storage device. Accordingly, a system (e.g. device or server) that has been sanitized has no usable residual or recoverable data even when using advanced forensic tools.
Typically, a sanitization is required when sensitive or confidential data is inadvertently stored on a system. For example, a Classified Message Incident (CMI) happens when data at a particular classification level is written to a storage not approved for such a classification. For instance, a CMI might occur when a user inadvertently sends an email with “top secret” information to an email system only approved for a lower clearance. As another example, a CMI may occur when information is reclassified after it has been stored on a system with a lower clearance. When a CMI occurs, the system administrator must take action to restore the system to a state as if the sensitive data had never been stored.
Sanitizing a backup or archival storage system, however, introduces unique challenges not present when sanitizing a single device such as a hard drive that might be erased with a pattern of overwrites. For example, if a backup takes place, for example, to a deduplicated storage system before the CMI is rectified, then the deduplicated storage system must also be sanitized. For an in-place storage system, sanitizing an object (file, record, etc.) consists of following metadata references to the physical location within the storage system, overwriting the values one or more times, and erasing the metadata as well as other locations that have become unreferenced. Deduplicated storage systems, however, are often log-structured with large units of writes, which typically do not support in-place erasure of sub-units. Instead, deduplicated storage systems typically require copying forward all live data and then sanitizing the original values. Accordingly, the sanitization process is typically applied to the entire file system of a deduplicated storage system as opposed to individual files. Performing a sanitization process, however, is resource intensive and monopolizes the system from performing other I/O processes and other processes such as garbage collection. Accordingly, the ingest performance of the storage system may be adversely affected. Thus, there is a continued need to perform sanitization that meet or exceed regulations, while still reducing the time and resource requirements of the sanitization process.