1. Field of the Invention
The present invention relates to a method for data analysis and digital forensics and a system using the same and, more particularly, to a method for data analysis and digital forensics and a system using the same, which can utilize data collected via the Internet as digital evidence.
2. Description of the Related Art
The history of Internet connections collected based on information of visited homepage on web history and registry may be closely related to operations performed by a user in the past and behaviors of the user.
Moreover, in the case of a suspect of a certain crime, clues to solve the crime may be found from the contents of emails which were sent and received for some time before the crime. In the case of civil crimes as well as criminal offenses, the analysis of the contents of emails is useful. Especially, in the case of crimes related to leakage of confidential information from a company, when a mail server in the corresponding company is used, the emails may be monitored in real time with a filtering function of an internal monitoring program or security program or the evidence of the crime may be easily released to the public at any time, and thus untraceable emails of external portal sites are most likely to be used instead of the use of the internal emails. Thus, to collect and analyze the contents such as sent and received mail boxes, attachment files, etc. based on webmail information of the suspect is necessary to solve the crimes.
In addition, in the case of personal blogs, detailed contents such as personal information, routine activities, etc. are included therein, and thus if there is a blog administered by an object of investigation, it is necessary to investigate the corresponding blog. Occasionally, when a closed cafe (i.e., a cafe operated by a portal site) is administered by the object of investigation or when there is a cafe which the object of investigation actively participates, it is necessary to analyze and investigate the cafe postings in which information related to the crime may exist.
In previous criminal investigations, a method of visiting the corresponding web pages one by one based on the suspect's web history to identify the contents of the web pages, thus analyzing such online data. However, if where there are numerous sites that the suspect visited or if an effective search needs to be performed in a shorter time, a method of obtaining online data in advance, generating index data for the corresponding data, and then performing search and analysis based on the generated index data may be more useful.
For example, in the case of email analysis systems, e-discovery products in USA (such as Clearwell produced by Clearwell Systems, EnCase eDiscovery produced by Guidance Software, Inc. etc.) provides the functions of loading mail box files from Outlook or Outlook Express, analyzing and retrieving the mail files after generating indexes. In the case of USA, there is a digital discovery system such that when a civil suit such as a conflict between companies is filed, it is mandatory to provide evidence related to the incident before legal battles and to provide data required by the counterpart or the court. Nowadays, most of the data newly generated are usually stored in digital format, and even the hard disc capacity of a personal computer exceeds the terabyte level. Thus, it is very difficult to search for data related to the incident from a huge amount of data for analysis within a given time. In order to solve this problem, many alternative products for e-discovery have been released, and it is the fact that these products have attracted much attention in Europe and Asia as well as USA at present.
However, most of these products are focused on providing a technique of extracting valid data from data stored in the hard disk or data collected previously and analyzing the data effectively. That is, with the use of these products, it is very difficult to analyze data existing online, and especially it is impossible to download and analyze webmail in real time.
Anyone can access the online data through the Internet, and thus, if there is evidence related to the incident, it can be found easily. However, such evidence can be deleted or changed by a person who is authorized such as a writer or server administrator, and thus care should be taken to keep the evidence.
FIG. 1 is a conceptual diagram illustrating the problems that can occur during data collection.
For example, referring to FIG. 1, it is assumed that an object of investigation read a web page, which was generated at a certain time (t1) in the past, at a certain time after it was generated.
After the occurrence of an incident, if the object of investigation is identified as a suspect, an investigate can confirm that the suspect read the corresponding web page at a time (t2) by personal investigation against the suspect (such as investigation of records of the computer used by the suspect) and can easily collect data from the corresponding web page at a time (t3).
Here, the meaning of the collection is to copy the web page stored as an html file and move it to a local hard disk. However, in some cases, it is possible to store only important information such as a main text, for example, in the hard disk and replace non-critical data such as banner advertisements, images, etc. with links. Moreover, the analysis of the corresponding web page allows the investigator to obtain evidence that the corresponding web page is associated with the incident. However, the related postings may be deleted or changed at a time (t4) intentionally to conceal the incident or due to an unexpected cause after the time (t3) of the collection of the data. In this case, although a copy identical to the original web page was acquired at the time of collection, there is no way to prove that the contents of the acquired copy is the same as those of the original and even whether the original which is the same as the copy existed in the past.
Thus, to prove the existence of data at the time of collection and to confirm the change of the contents after the time of collection are necessary to solve the related dispute.