Current digital forensic tools lack the ability to recover fragmented digital evidence (i.e., data files) where file tables in a data storage medium have been damaged and/or destroyed. In fact, many court cases based upon digital evidence where the data is fragmented on large capacity hard disks are dismissed. The primary reason for this dismissal is the enormous cost in having an analyst attempt to manually reassemble the fragmented data into files. Currently, the manual recovery process requires an analyst to manually review and restore data sectors on the data storage medium containing the digital evidence and manually reassemble the fragmented file by carving data sectors off the medium.
Data carving is the practice of searching for files, data, strings, or other kinds of objects based on content in order to recover files and the corresponding fragments of files when file table entries are corrupt or missing, as may be the case when files have been deleted or when performing an analysis on logically damaged media. Once carved, the analyst would then attempt to open the file with a viewer appropriate to its file type. The resulting carved file would not render fully and/or successfully load in its respective viewer(s) unless all sectors of data were properly carved out and placed in the correct order. If the manual data carving process fails, the analyst would then either have to reattempt the data carving process or deem it as not feasibly recoverable. Such a manual task performed on digital data media having gigabytes of information is both time consuming and expensive.
Current computer implemented methods to reassemble fragmented files mostly require a rather explicit set of circumstances not likely to be encountered in a real world scenario by a forensic analyst. For example, the publication by N. Memon and A. Pal, “Automated reassembly of file fragmented images using greedy algorithms,” in IEEE Transactions on Image Processing, Vol. 15, Issue: 2, 385-393 (2006)(herein after “the Memon publication”), while providing interesting insight into the manipulation of image fragments, does not take into account any sort of compressed graphical format. Also, the Memon publication outlines a method, in essence, that evaluates all possible permutations of an image (also frequently referred to as brute force) and then analyzes the rendered image to match fragments together via pixel matching, sum of differences, and median edge detection.
Pixel matching, in short, is a comparison of the color of a pixel on one edge of a fragment to the color of a pixel on the next possible edge of a fragment. Problems can arise with this method in several common situations. First, if the next fragment of data does not belong to the picture but rather to a data file (such as an executable), pixel matching would immediately fail in that it may believe the data from the executable file is valid bitmap data due to a lack of structure in the bitmap format. Second, this method as outlined requires a 24-bit per pixel bitmap format, with no compression. If any compression is introduced whatsoever, this method would not be reliable. For example, 24-bit per pixel bitmaps can utilize run-length encoding (RLE).
To briefly outline RLE, data is compressed by finding repeating values and substituting them with the number of times the value repeats, and only one copy of the value itself. For example, take the string “HHHHHHHEEEELLLOOO”. If RLE were implemented against this string, the resulting output would be “7H4E3L3O”. In an uncompressed bitmap, “HHH” would be a pixel. “HHH” would again repeat. “HEE”, “ELL” “LOO” and “OOO” would follow. Each set of data would have a corresponding pixel color associated with it. If compression is implemented, the method outlined by the Memon publication may very well see the number of times a value is repeated and believe it to be a valid pixel color for use in comparison. In result, it would be comparing “HHH” from the uncompressed string to “7H4” in the compressed string. This can result in both false positives and false negatives. Once two pixels have been compared and found to have exactly the same value, it adds one to a count. The higher the count value after completing its comparison, the greater the likelihood the two fragments belong to each other.
Sum of differences (SoD) is a very similar technique to pixel matching. SoD compares pixels across the borders of fragments, takes the absolute value of the difference in byte values between two adjacent pixels, and then sums all of the calculated absolute values together. According to this technique, the lower the final value, the more likely the fragments belong together. This technique is heavily reliant upon several “laboratory” conditions in order to provide reliable results. Take, for example, two bitmap fragments that are from different images that also differ in dimension. One such fragment of data, which for example is 4 kB in size, could be a fragment from a 500×500 bitmap or a fragment of a 1000×1000 bitmap. It can easily produce false positives since the borders cannot be accurately determined without first rendering the image. The 4 kB fragment of data may span two rows in the smaller bitmap, or it may not even fill a complete row on the larger one. Second, SoD can produce false positives when comparing non-bitmap fragments. Again, using a fragment of data from an executable file as we did in our previous example, it will attempt to establish the borders and then compare the results. Third, SoD can result in false positives when comparing fragments from similar images. Because SoD is looking for similar borders that seem most likely to fit, it cannot be exact. Because of this, pictures taken of natural scenes, cities, or even photos taken of a person in a similar pose can wind up mashed together. The Memon publication provides an example where the edge of a fragment of a dog lined up with the top edge of a photo of a jet, and another fragment from the dog photo lined up with the tail of the jet. As images become larger and larger, the margin for error increases significantly. The 4 KB fragment of data in a large image file will give very little basis for comparison.
The third technique described in the Memon publication is called median edge detection (MED). MED compares the value of a pixel color to the values of the pixels above it, to left of it, and to the upper left diagonal. It then takes the sum of the absolute value of the difference between the predicted value and the actual value. In short, by looking at the pixel colors around it, it derives a predicted value for the next pixel. The smaller the difference in the prediction to the actual value of the pixel after the fragment was added, the more likely it matches. This method is again similar to the two above except that it uses a different calculation in looking for the smallest change in color from one edge of a fragment to the next. This has the same shortfalls as the other techniques. Fragments will likely be mashed together improperly, especially if there is file fragmentation on the hard drive.
The above techniques are useful basically on smaller, uncompressed bitmap files. However, in real world conditions, due primarily to consumer preferences, photographs are normally bright and crisp and provided in a large format. All of these qualities introduce significant problems that the above mentioned techniques fail to take into consideration and overcome. In addition, although prior art forensic tools do allow the rendering of recovered image files, they do not allow the analyst to add sectors, remove sectors, or otherwise make intelligent alterations. Prior art forensic tools also make no additional attempt to recover fragmented data except through “Header to Header” or “Header to Footer” techniques. Neither of these techniques accurately recovers a file when there is any fragmentation involved.