It is known, and desirable, in the field of digital forensics to identify images taken from a particular camera. This is of particular importance, in fields such as criminal investigation, where an investigator wishes to prove that an individual image, or images, were taken using a specific camera. The classification of images from one or more devices also has applications in commercial sectors, such as, classifying of images, cataloguing, managing image processing etc. For example, if the photo is of an indecent nature, e.g. child pornography, being able to prove that the camera was used to take a given photo may be used as evidence against the owner of the camera. In particular, if a link between an individual and a particular camera can be established, e.g. the camera may be recovered in a raid, being able to prove that a specific photo, or photos, originates from that camera allows investigators to determine a causal link between the owner and the images.
A method of identifying a camera is via information contained in the metadata of a digital photo. Such metadata may often contain information such as the time and date a photo was taken, as well as a device identifier such as a camera name. However, criminals taking indecent photos will often remove such data in order to subvert the identification process. Some types of camera will automatically embed a watermark or hash mark into the photos taken with the camera. However, not all cameras have this ability and therefore this identification method is limited to those images taken with those particular makes and models. It is therefore desirable to be able to extract a signal that is present in all makes and models of devices that this is not easily subvertible.
In particular it is desirable, given a set of digital imaging devices such as cameras and scanners, to identify one of the devices that have been used in the acquisition of an image under investigation or return a negative report indicating that the image is taken by an unknown device.
It is known that each camera will have a unique intrinsic sensor noise pattern (SNP) which results from the inhomogeneities of sensor of the camera. The inhomogeneities are specific to a particular sensor and therefore allow for the unique identification of a camera via its sensor or CCD. The term fingerprint and SNP will be used interchangeably in this specification. This SNP is present in every images taken by a device, though without processing of the image it is often indistinguishable from the detail of an image.
WO/2007/094856 identifies a method of extracting the SNP present in an image, and comparing the extracted SNP with a set of reference SNPs. These reference SNPs are constructed from imaging devices that are accessible by the investigator. Each reference SNP is constructed by taking the averaged version of the SNPs extracted from a number (of the order of several tens) of low-variation images (e.g., blue sky images) taken by the same device.
A disadvantage of WO/2007/094856 is that the SNPs extracted from images may be highly contaminated by the details from the scene and as a result the misclassification rate is high. To compensate for the influence from the details of the scene, the whole image has to be analysed in order to achieve an acceptable identification rate. This may result in unacceptably high demands of computational resources. A further disadvantage is the construction of the reference SNP requires several low-variation images, which without possession of the originating device may not be possible to obtain.
Additionally, during a digital forensic investigation the image set that needs to be analysed may contain several thousand images taken by an unknown number of unknown devices. The method of comparison in WO/2007/094856 is a pair-wise comparison method which becomes prohibitively expensive for large data sets.
Typically, a forensic investigator will want to identify, or cluster, images that have been taken by the same device. Some of the many challenges in such a scenario are:                the forensic investigator does not have the imaging devices that have taken the images to generate clean reference device fingerprints (such as the reference SNP) for comparison;        there is no prior knowledge about the number and types of the imaging devices;        the similarity comparison is pair-wise. With a large dataset, exhaustive comparison is computationally prohibitive; and        given the shear number of images, analysing each image in its full size is computationally infeasible.        
WO/2007/094856 may seem like a candidate method for the first and simpler task of fingerprint extraction. However, the influence from the details of the scene and the absence of the imaging devices, prevents the investigator from acquiring a clean reference SNP, therefore unless the investigator has a number of “clean” images from which to extract a SNP (which unless they are in possession of the originating device would be incredibly unlikely) this document has limited applications. Additionally, this document is unable to perform the clustering task to identify images taken from the same, possibly unknown, device.
Furthermore, it is desirable to be able to determine if two images from a data set have been taken with the same device. In particular, if neither the originating camera nor the SNP are within the possession of an investigator. Without the originating camera, nor the SNP such a determination is challenging.