Even when “permanently” deleting digital or electronic files from a persistent information storage medium such as a computer hard disk, the computer's operating system (OS) will generally delete only the reference to the file(s) in, for example, some kind of file allocation table, but leave the data intact on the medium itself. As time goes on, the pages, sectors or other storage units that were deleted may be overwritten, but at least for some time an opponent can use this persistence to access the information using some form of forensic attack. In the undelete forensic attack, the opponent scans the hard drive for known digital file patterns and, with some luck, if he finds them, then it is a trivial procedure to recover the data that still remains unchanged in the medium. Indeed, this is also a part of what “data recovery” involves in the context of non-malicious, inadvertent loss of access to data.
According to many known methods for erasing or “sanitizing” a medium, the locations in the medium where the file(s) was located are overwritten with zeros, ones, and/or random/encrypted bit patterns a given number of times. Depending on the type of medium and the number of times the file is overwritten, the likelihood of success of data recovery using a simple undelete forensic attack is reduced. One such known overwriting method for secure deletion is the Gutmann Technique (see Peter Gutmann, “Secure Deletion of Data from Magnetic and Solid-State Memory,” Sixth USENIX Security Symposium Proceedings, San Jose, Calif., Jul. 22-25, 1996, accessible at http://www.cs.auckland.ac.nz/˜pgut001/pubs/secure_del.html.
Another widely used method for secure deletion is the United States Department of Defense (DoD) Standard DoD 5220.22-M, which is the designation for the procedure described in the “National Industrial Security Program Operating Manual (NISPOM).” DoD 5220.22-M is used by, for example, not only by the DoD, but also by the United States Department of Energy, Nuclear Regulatory Commission, and Central Intelligence Agency.
According to the DoD 5220.22-M, somewhat varying procedures are prescribed for different information storage media, such as for a non-removable rigid disk as opposed to an Electronically Erasable PROM (EEPROM), Magnetic Bubble Memory, Static Random Access Memory (SRAM), even CRT monitors, to name just a few of the many different media mentioned in the “5220.22-M Clearing and Sanitization Matrix.” Depending on the medium, physical procedures such as degaussing and ultraviolet erasure are used in addition to repeated overwriting. Common to the various procedures, however, is the overwriting of addressable locations with a character, then its complement, then a random character, followed by a verification step.
Relatively sophisticated forensic techniques using magnetic and electron microscopy, such as Magnetic Force Microscopy (MFM) and Scanning Tunneling Microscopy (STM), are typically required to retrieve such overwritten data, but it is possible nevertheless. The effectiveness of these sophisticated forensic attacks is undermined, however, by the number of times and careful selection of patterns for the overwriting phase. Given enough effort, it is usually possible to make the recovery of the “erased” data more and more difficult until it is prohibitively expensive for an attacker to recover the data.
Even assuming “perfect” erasure of data from a medium using these known techniques, it is often still possible to prove at least the existence of the data and the intention to destroy it. In the common case where the erased data is (was) organized in some notion of a “file,” with modern journaling file systems, portions of the file may have not been deleted properly and an audit trail is generated. Statistical analysis of the contents of the medium may therefore reveal “wiped” spots that have a different pattern from typical free or used space. In other words, a file may have been erased with a random pattern but if the erased file is in a free address space that has a non-random bit pattern (which usually does), then the random bit pattern itself will stand out from its surroundings.
Consider a simple, “natural language” analogy and assume that a document has the following “Lorem ipsum” (essentially arbitrary Latin text used as text filler) “paragraphs”:
Mauris ultricies. Nam est ligula, ultricies in, tincidunt non, interdum vitae, augue. Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Gbs;kdHtskjf tejk17fdksobneoi7 lujhoiw73-6uiv.
Suspendisse potenti. In dui ante, consectetuer in, vestibulum consectetuer, viverra quis, felis. Integer tortor metus, accumsan sed, hendrerit quis, tristique eget, magna.
Even without understanding the words of the original text, most readers would realize that something has been changed or replaced or deleted or filled in at the end of the first paragraph because the character pattern, although random, stands out—even though the letters in a normal text can come in an essentially arbitrary string of words, there is still a non-random structure that can be established through experience or statistical analysis. For example, in a large enough, arbitrary English-language text, the letter “e” usually occurs the most often, followed in frequency by “t” “o” “i”, etc.; some character strings almost never occur (long consonant or vowel strings, for example), and so on. A simple example of this would be a text (.txt or .doc) file, which will typically contain the digital representations (for example, ASCII) of the underlying alphanumeric characters. The byte representing “e” will thus usually occur most frequently, followed in frequency by the bytes for “t,” “o,” “i,”, etc.
Most readers would not find it in any way unusual, however, that there is a blank line between the paragraphs, because this is a typical way for “free” and “unused” space to appear. Non-random, digitally encoded information stored on a disk exhibits analogous properties.
FIG. 1 illustrates this phenomenon in a more abstract, storage-related context, namely, how a magnetic media surface might look like after an operation using even the most “secure” prior art erasure methods found in the prior art. In FIG. 1, Typical Free Space (TFS), Used Space (US), and Atypical Free Space (TFS) are illustrated.
In this example, an Atypical Free Space fragment is located in a region that, otherwise, contains typical Used Space. The statistical anomaly that this gives rise to may provide a hint to a possible attacker that, given the right time and circumstances, he might get illicit access to the information in this fragment, which might be sensitive information. At the very least, the attacker may be able to recognize that something happened in the region containing the Used Space. An attacker that finds no evidence of the existence or deletion of sensitive information is more likely to leave the target alone, however. With no reason to suspect the presence of any potentially recoverable sensitive information, the opponent will have no reason to pursue an attack on the target.
Not all attacks involve illegal activity. Another type of “attack” could be a forensic analysis, by a court or state authority, or by a private party such as during the “discovery” phase of some litigation the United States. If it can be seen that a file has been deleted and securely erased, for example, then this itself could indicate wrongdoing such as tampering with evidence. Even if the original file was no proof of wrongdoing, a secure erasure trail could be subjectively used to undermine the legal confidence and reputation of the party.
Even if there is no sign of erasure of the medium itself, features of many modern operating systems still retain information about which processes they have scheduled for execution. If an attacker sees that a file-erasure program has been executed, this fact alone may be undesirable evidence leading to further inquiry or attack. It is therefore also necessary to remove the traces of a secure erase operation, but prior art solutions fail to do so.
The following references are representative of prior art mechanisms for “secure” erasure of data and files that exhibit some or all of the shortcomings mentioned above: Published U.S. Patent Application Nos. US2006117153, US2006117136, US2002181134; U.S. Pat. Nos. 6,731,447, 5,265,159; European Patent No. EP0575765; Canadian Patent No. CA2388117; and Japanese Patent No. JP6095949.
What is needed is therefore a way to more securely erase and “sanitize” at least parts of an information medium. Preferably, this should be done in such a way that even the act of erasure is much less detectable than in the prior art.