In computing environments, users often initiate searches for images based on objects that may be displayed in the images and/or based on general themes of images. For example, a user may want to see ideas for planting flowers and may initiate an image search for “colorful flower garden”, such as in a Web browser application. Likely, the images that are returned as a result of the image search will be images of colorful flowers arranged in a flower garden if the images are identified or labeled correctly. Typically, images are “tagged” with any number of identifiers that are intended to facilitate a search for an image based on the identifiers. For example, an image that is returned as part of the image search for “colorful flower garden” may have tags that include “flowers”, “blue sky”, “garden”, “cat”, “colorful”, “gnome”, and other tags that identify objects and/or themes shown in the particular image, and overall identify the image as a candidate image to return as an image search result.
With the proliferation of digital imaging, many thousands of images can be uploaded and made available for both private and public image searches. However, correctly identifying and tagging the images is a monumental and time-consuming task that is no longer viable using humans to review and label the ever-increasing stock of images. Further, conventional machine learning techniques used to analyze images and predict labels for unlabeled images also require vast amounts of human-generated training data for image predictions, which is time consuming and impractical for large data sets of unlabeled images. Additionally, the conventional machine learning techniques are prone to produce inaccurate image predictions due to the amount of required training data and/or due to the accuracy of the training data.
Increasingly, convolutional neural networks are being developed and trained for computer vision tasks, such as for the basic tasks of image classification, object detection, and scene recognition. Generally, a convolutional neural network is self-learning neural network of multiple layers that are progressively trained, for example, to initially recognize edges, lines, and densities of abstract features, and progresses to identifying object parts formed by the abstract features from the edges, lines, and densities. As the self-learning and training progresses through the many neural layers, the convolutional neural network can begin to detect objects and scenes, such for object and image classification. Additionally, once the convolutional neural network is trained to detect and recognize particular objects and classifications of the particular objects, multiple images can be processed through the convolutional neural network for object identification and image classification.