The increasing reliance on digital storage devices such as hard disks and solid state disks for storing important private data and highly confidential information has resulted in a greater need for efficient and accurate data recovery of deleted files during digital forensic investigation.
File carving is a technique to recover such deleted files in the absence of file system allocation information. However, there are often instances where files are fragmented due to low disk space, file deletion and modification. For example, 96.5% of the files tested on the FAT disks had between 2 to 20 fragments in a recent study. A deleted fragmented file usually composes header fragment, intermediate fragments and footer fragment. These fragments are not stored in contiguous blocks and may be out of sequence on the disk. Without the file allocation information existing on the file system, it is difficult to recover deleted fragmented files. The problem is further complicated due to the non-standardized sizes of the files and the fragments. This scenario of fragmented and subsequently deleted files presents a challenge requiring a more advanced form of file carving techniques to reconstruct the files from the extracted data fragments.
The reconstruction of files from a collection of randomly mixed fragments is useful and essential in the field of Digital Forensics in the situation that the files which can assist in crime investigation have been deleted. For example, the files may have been accidentally deleted by the owner/user, or the file system information may has been damaged with the information to retrieve the fragmented files destroyed.
Bifragment gap carving is one fragmented file carving approach, which assumes that most fragmented files comprises two fragments that contain identifiable headers and footers. This technique exhaustively searched for all the combinations of blocks between an identified header and footer, while incrementally excluding blocks that result in unsuccessful decoding/validation of the file. This approach could only support carving for files with two fragments.
In another approach, the file fragments are “mapped” into a file by utilizing different mapping functions and discriminators. These mapping functions represent various ways for which a file can be reconstructed and the discriminators check the validity of the reconstructed file until the best one is obtained. The object of this approach is to derive a mapping function which minimizes the error rate in the discriminator. Accordingly, it is necessary to construct a good discriminator to localize errors within the file, so that discontinuities can be determined more accurately. If the discriminator fails to indicate the precise locations of the errors, then all the permutations need to be generated, which could become intractable.
In carving, the basic and simplest approach would be to test each fragment against one another to check how likely any two fragments is a valid joint match. Joints are then assigned weights and these weights represent the likelihood that the two fragments are a correct match. Since the header can be easily identified, any edge joining the header is considered a single directional edge while all other edges are bi-directional. Therefore, if there are n fragments (excluding headers, h), there will be a total of n(n−l+h) weights. The problem can thus be converted into a graph theoretic problem where the fragments are represented by vertices and the weights assigned to the edges indicating the likelihood that two fragments are adjacent in the original file. The carving is based on finding a file construction path with the best set of edge weights. In this case, the starting vertices will correspond to the headers. Greedy heuristic based techniques have been used to computer weights between all fragments and sort fragments according to weights for each fragment. This approach performs a pre-computation of all the weights between two fragments, which is computationally expensive.
A further approach is based on sequential hypothesis testing which assumes contiguous blocks assignment for file storage on disk. This approach joins next block in sequence to current block, performs file fragment processing using existing libraries and applications, and performs boundary testing of joint to determine validity of the joint. However, the assumption may be weak.
In another aspect, if the fragmented file is a compressed file, such as a JPEG images, the problem of file carving is made even harder as the entire file has been encoded based on the header information and with the separated fragments, the fragment joints can not be detected simply by comparing adjacent pixels.
An approach for reassembly of a fragmented JPEG file is based on the assumption that vertically oriented lines are repeated in the DV value chain at a certain interval. The approach searches for pairs of fragments with valid Restart marker sequences, and verify joints based on repeated value checking.
It is desired to improve the resource consumption, scalability, overhead incurrence and weak assumptions in existing methods.
It is also desired to provide an efficient method to reassemble a data file, taking into consideration of realistic and complex fragmentation scenarios.
It is further desired to provide an efficient method to reassemble a data file which has been encoded.