To classify a digital photo-realistic image, a human can view the photo-realistic image and then manually tag the photo-realistic image with descriptive metadata. The types of information provided in such tags is virtually limitless. Common tags may indicate the names of people in the photo-realistic image, the objects in the photo-realistic image, and the location and/or event at which the photo-realistic image was captured. Manual tagging produces highly accurate tags, because human brains are highly skilled at interpreting the content of photo-realistic images. However, manually tagging photo-realistic images can consume an inordinate amount of time, particularly when the collection of photo-realistic images to be tagged is large.
To avoid the human effort required by manual tagging, techniques have been developed to automatically tag photo-realistic images with certain types of information. For example, digital cameras can automatically store some types of information with each photo-realistic image, such as time, date and GPS coordinates at the time at which the photo-realistic image is captured. However, automatically tagging photo-realistic images with some types of information is not so straightforward.
Various techniques have been developed to automatically identify complex features, such as human faces and objects, within photo-realistic images. Such techniques include, for example, using photo-realistic images that depict a particular type of object to train a machine learning engine to recognize that type object in other photo-realistic images. Once trained, the machine learning engine may predict the likelihood that any given photo-realistic image contains the type of object in question. Once analyzed, those photo-realistic images that are predicted to contain a type of object may be tagged with metadata that indicates the object they depict. For example, a machine learning engine may predict that the photo-realistic image of the front of a house depicts a door, and that photo-realistic image (or a set of pixels within the photo-realistic image) may be tagged with the metadata indicating that a door is depicted in the photo-realistic image.
Unfortunately, classifications made by machine learning engines can be indefinite and imprecise. To reflect the indefinite nature of such classifications, the classification automatically assigned to an object in an image may be a list of labels with corresponding “confidence scores”. For example, a trained machine learning engine may classify a particular object in a particular image as: 45% bottle, 25% vase, 25% wine glass, 5% test tube. Thus, there is a need to improve the accuracy of automated classifications of photo-realistic images.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section.