The proliferation of digital devices that comprise a photo camera has favored an explosion of the volume of image data stored by a user, and it is quite easy for a user to end up with many image duplicates in the user's image library.
This situation can be even worse in the case of a home network environment, where several users can add images to an image library, the library possibly being physically distributed on several, dispersed storage devices, for example on hard drives of different PCs, on a NAS (Network Attached Storage), on USB keys, etc.).
The reasons why an image library can end up by containing many duplicate images are diverse. Unintentional duplicate images are produced through copy actions. For example, a user who organizes photos in different directories does not move the photos, which would have been appropriate, but rather copies them unintentionally; a user that wishes to transfer photos via e-mail adapts the photo resolution for including them in his e-mail but unintentionally keeps the low-resolution copies; a user that views images with a viewer application modifies these by rotation, or modification of color and contrast and unintentionally keeps the unmodified copy in addition to the modified copy. Other copy actions are intentional and are due to the fact that the user has no longer an overview of the data that he has stored, a situation that is getting worse when the user has multiple storage devices and many images, and gets even worse when multiple users add and copy data to the multitude of images stored. The user, knowing that he does not have a clear overview of the images stored, worsens this situation by preferring finally to copy rather than to move or replace images, by fear of deleting them. This creates a situation where the user no longer knows which images are disposable copies and which are not.
In all these scenarios, a duplicate detection tool can be useful, to assist the user with the cleanup or management tasks of the user's image library.
Prior-art detection of image duplicates allows to detect duplicates according to criteria such as checksum data, creation data, file name, file size, and image format. Duplicates are detected that comply with any of the selected criteria and upon each detection user intervention is needed to determine if the user wishes to delete or not the detected duplicates from the image library. The reason why the final decision about what actions to carry out on the duplicate images is left to the end user is that the perception of what constitutes a duplicate image is a subjective matter. Depending on the user and the context, a duplicate image can be: an exact (bit-by-bit) copy of an image, a copy of an image visually identical but that has been encoded with a different compression algorithm, a copy of an image visually identical but that has undergone geometrical or colorimetric transformations, etc.
What is thus needed is a method capable of translating this subjective perception into parameters required for de-duplication so that the operation can be autonomous and adapted to the user's duplicate-image perception.
The European patent application no. EP11306284.8 filed Oct. 4, 2011, “Method of automatic management of a collection of images and corresponding device”, proposes a method to detect image duplicates that uses a set of tags to identify the different copies of an image with a label representative of the kind of duplicate, i.e., duplicate or near-duplicate. In the near duplicate case, the tag also indicates how the copy of the image differs from the original one. In some particular cases, the information supplied by these tags is used by the end user to make a decision about what duplicate images to remove from the photo library. The reason why the final decision about what actions to carry out on the duplicate images is left to the end user is that the perception of what constitutes a duplicate image is a subjective matter. The system described in the mentioned European patent application is conceived to identify the broadest range of transformations that an image can undergo in the framework of a residential photo library. The checksum technique is used to detect the bit-by-bit exact duplicate images and the fingerprint technique is used to detect the near-duplicate images. The fingerprint technique is tuned in order to detect the most severe transformations that can be applied to an image in a personal photo library because the technique has to be conceived for the worst case conditions. It is worth mentioning that the computation time of the fingerprint of an image increases with the complexity of the transformation to be detected and it is much higher (up to 500 times) than the computation time of the checksum of an image. This is not optimum in the cases where the user considers as duplicates only the bit-by-bit exact copies of an image, or the visually identical images but with different resolutions, because in these cases the checksum computation tool or a simpler but faster fingerprint computation tool could be used to identify the desired duplicate and near-duplicate images. Therefore, it is desirable to take into account for the automatic management also the duplicate-image perception that the user has.
In this invention disclosure, we propose a method to capture the subjective duplicate-image perception from the end user and to translate this subjective perception into the objective parameters required for an automatic management of images in an image collection.