The present application relates generally to an improved data processing apparatus and method and more specifically to mechanisms for de-duplication aware secure delete.
Data de-duplication is a storage concept where redundant data are eliminated to significantly shrink storage requirements and improve bandwidth efficiency. Data de-duplication is a specialized data compression technique for eliminating duplicate copies of repeating data. The technique is used to improve storage utilization and can also be applied to network data transfers to reduce the number of bytes that must be sent. In the de-duplication process, unique chunks of data, or byte patterns, are identified and stored during a process of analysis. As the analysis continues, other chunks are compared to the stored copy and whenever a match occurs, the redundant chunk is replaced with a small reference that points to the stored chunk. Given that the same byte pattern may occur dozens, hundreds, or even thousands of times, the amount of data that must be stored or transferred can be greatly reduced.
In the de-duplication process, duplicate data is deleted, leaving only one copy of the data to be stored. This single copy is called the master copy, and in place of the deleted copies (secondary copies) the file system keeps a reference pointer, which points to the master copy. When de-duplication is performed on in-band traffic, it is referred to as in-line de-duplication.
Secure delete, also referred to as data clearing or data wiping, is a software-based method of overwriting data that completely destroys electronic data residing on a hard disk drive or other digital media. Unlike degaussing and physical destruction, which render the storage media unusable, secure delete techniques remove all information while leaving the disk operable, preserving information technology (IT) assets and environment.
Software-based overwriting uses software applications to write patterns of random meaningless data onto all of a hard drive's sectors. Center for Magnetic Recording Research (CMRR) defines a set of standards for secure delete on disk devices. Secure delete of a file requires performing multiple writes of the patterns on the file blocks. These writes must be performed directly on the physical device.