1. Field of the Invention
The present invention relates to methods and apparatus for detecting whether particular data is stored in a data store. The method has particular, although not exclusive, applicability in determining whether an image of interest is stored in a data store.
2. Description of the Related Art
Computers are commonplace in modern society. Computers are now commonly used in domestic and business environments. Many computers are now connected together using computer networks, so as to allow the computers to share data and resources with one another. Use of the Internet is now widespread. Many computers are connected to the Internet so as to be able to communicate with one another and so as to be able to retrieve data from remote computers. One very popular use of the Internet is in retrieving data over the world wide web in the form of web pages.
The ubiquity of computers has undoubtedly brought many economic and cultural benefits, but it has also created many technical and social challenges. While the Internet allows data and information to be shared amongst networked devices in a convenient and efficient manner, it cannot be overlooked that the Internet has been used by some to further their criminal activities. A particular concern relates to use of the Internet by paedophiles to distribute malicious digital pictures, in particular indecent images of children, between one another. Such criminal activity is a focus for law enforcement and other governmental agencies in their resolve to protect the public.
In order to address the challenges posed by criminal use of the Internet various computer forensics techniques are routinely used. Crime involving computers may be classified in three classes: the computer is the target of the crime; the computer provides a repository of information used or generated during the commission of a crime; or the computer is a tool used in committing the crime. These different classes of computer crime mean that a wide range of investigation and analysis techniques are needed to properly address all classes of crime. The techniques used often rely on the context surrounding the criminal activity under scrutiny.
Computer forensics techniques are frequently used to inspect the contents of a computer hard drive of a suspect in a criminal investigation. Due to the greater capacity of today's hard disk drives (upwards of 500 Gigabytes) and the amount of information that may be held on such a hard disk drive, a large volume of data needs to be analysed within tight temporal constraints. Yet, identification of relevant evidence is a time-consuming process. A major challenge facing law enforcement and national security agencies is accurately and efficiently analysing this growing volume of evidential data. The diversity of devices and file systems used, from desktop computers and notebooks to Personal Digital Assistant (PDAS) and mobile phones, should also be taken into account when determining the techniques which are to be used.
Current methods of searching a hard drive for files of interest, and in particular malicious digital pictures, often involve time-consuming manual processing. An image of the hard drive is taken to replicate the original evidence source. A forensics tool is then used to recreate the logical structure of the underlying file system from the created image. A computer forensic analyst then views the files, both extant and deleted, and files of interest are reported with supporting evidence, such as time of investigation, analyst's name, the logical and actual location of the file, and so on. As the investigation of the hard disk drive relies on the analyst viewing files as if part of the file system, this process is laborious and could be prone to human error.
Attempts have been made to improve the speed of the search. One technique which has been proposed involves comparing checksums of files on a hard disk drive under investigation to checksums of known malicious files recorded from previous investigations. One checksum which has been used in such comparisons in the MD5 checksum. Two problems remain with this practice. First, recent research has questioned the reliability of MD5 checksums in producing digital signatures and this will have serious ramifications within the legal arena. Second, a suspect may avoid detection of a file of interest to law enforcement agencies by altering just one byte within the file of interest, which will alter the MD5 checksum produced. Indeed, techniques based upon checksums have been used to detect copyright infringements on the popular YouTube site, and users of that site are actively defeating this type of detection technique by altering a single or small number of bytes within a file.