The present disclosure relates to image storage, and more specifically, to processing datasets of images to improve coherence and reduce memory usage.
As used herein, an image dataset refers to a collection of images, stored in computer memory or storage. An image dataset may be stored, for example, using one or more databases. Image datasets can be utilized for any number of technologies and uses. For example, many image recognition algorithms depend on a voluminous data set of exemplar images that new images are compared to. In particular, many facial recognition methods need a significant number of pictures of the target individual, so as to ensure accurate and reliable identification. These datasets have continued to grow dramatically in size, resulting in increased computing resource usage including memory, storage, and processing power. In order to ensure that the database images are correct (i.e., that they show the correct target individual), current approaches require humans to verify each image. Otherwise, the image recognition system has no reliable dataset to compare received images to. As these image datasets have expanded, it has become impossible for human curation to satisfy these requirements.
Additionally, the number and variety of images in these datasets should be improved to reduce resource consumption. However, humans are incapable of the type of curation that facial recognition algorithms require. For example, two images may appear substantially identical to a human when they appear clearly distinct to computer methodologies. Culling such images leaves the recognition system with less data than it requires for reliable use. Similarly, images may appear distinct to a human operator while image recognition algorithms cannot distinguish between them. Leaving such images in the dataset requires additional storage, memory, and processing power, but does not improve the reliability of the system. Additionally, there is a need to better organize the data in image databases so as to reduce resource requirements in use. For example, the data should be stored such that the most relevant or reliable images are accessible easily and quickly. However, as above, humans are incapable of recognizing the aspects that image recognition algorithms depend on. As such, there is a need for an automated and unsupervised system for curating image datasets.