With a rapid growth in the amount of digital data generated and stored in a daily life as well as in various environments, a lot of time is required to perform each step of digital forensics.
In a ‘forensic retrieval’ process in the digital forensics, since retrieval depending on repeated queries should be performed in order to rapidly find a piece of evidence in a large quantity of data, a significant amount of time is required.
However, text retrieval tools used in the digital forensics at present perform retrieval at an average speed of approximately 20 MB/s with respect to one query. When the text retrieval tools are used, 14 hours or more are required to retrieve a query in data of 1 TB and as a time required to one retrieval operation, a time which is in proportion to the number of times is required at the time of requesting a plurality of retrievals.
When various documents published in a digital forensic field or application examples are synthesized, contents associated with the “size” of a hard disk drive to be investigated are mentioned most frequently and considered most urgently in digital forensic analysis.
For example, when an eDiscovery field that has recently become a hot issue of the digital forensic field is considered, a company should store and keep electronic documents (ESDs) generated therein for approximately 3 to 6 months and should find materials required when a relevant case is presented and submit found materials to a court of law within the period.
However, the size of an e-mail generated or received daily within a domestic major company group is at least 500 GB or more and the size of the e-mail is apparently considerable even though the size of the e-mail kept for 3 months is simply calculated.
Considering that it takes a short time to prosecute after seizing evidence in a domestic legal environment, an efficient retrieval method has been increasingly required in investigating a large scale of storage media including a PC, an electronic settlement system, an electronic mail system, and an accounting database.
One of solutions capable of solving known problems is an index based retrieval technique. The index retrieval technique is a technique that generates an index of target data and presents a retrieval result in real time by referring to the index during retrieval. The index retrieval technique has a characteristic in that some time is required to generate the index, whereas all retrievals are performed in real time after the index is generated.
However, enormous index analysis and retrieval algorithms for data stored in a predetermined medium have been continuously proposed and a research in the digital forensic field is increasingly required again even though the algorithms are applied to various fields at present because the index retrieval techniques are not suitable for digital forensic evidence retrieval.
Most index retrieval techniques which are used to retrieve web data on the Internet aim at rapidly providing as many retrieval results as possible by finding most similar information as a user's intention among data including a user's queries.
However, the digital forensic retrieval aims at accurately providing whether a query made by an investigator is included within a predetermined large quantity of evidence data and which position the query is included in without fail.
In regards to this aspect, the digital forensic retrieval is different from existing index retrieval techniques in terms of a content and an amount of information to be stored in the index database, and a processing procedure. The existing index retrieval techniques focus on an index retrieval speed, whereas the digital forensic retrieval focuses on an index generation speed.
Even though index retrieval functions provided in the existing digital forensic tool are provided integrally with or independently from the digital forensic tools, the index retrieval functions are provided to be operated in a single system.
The providing pattern reaches the limit quickly again even though a specification of system hardware at the time of indexing a large quantity of data is improved, such that performance rapidly deteriorates.
Accordingly, requirements for improving the index generation speed have been gradually increased so that a large quantity of evidences are substantially retrieved in the digital forensics.