Localization refers to an ability to identify a location within an image that is associated with a semantic class, e.g., a particular object, exhibits a feeling such as “happiness,” and so forth. Localization is used to support a variety of image functionality, such as semantic tagging, hole filling, artifact removal, image search, captioning, segmentation, object detection, and so forth. For example, localization may be used to disambiguate visual semantics in an image search, e.g., to differentiate a firetruck from a truck fire. Accordingly, accuracy of localization also promotes accuracy in the implementation of this image functionality.
Although localization may be readily performed by humans, this is a particularly challenging problem for computing devices to perform without assistance from humans, at least in part. Accordingly, an inability of computing devices to accurately perform localization may also inhibit an ability of the computing devices to support the variety of image functionality described above. Conventional techniques to do so, however, are often inaccurate or require human assistance.
Conventionally, machine learning techniques may be used to train a model to identify whether an object is or is not included in an image. However, in order to determine where the object is located in the image in conventional machine learning techniques, training data is required that includes bounding boxes describing the location. In order to generate these bounding boxes, conventional techniques rely on users to manually draw a boundary of the bounding boxes, which is expensive, inefficient, and oftentimes inaccurate. For example, manually drawn bounding boxes typically include portions of the image that do not include an object being localized and thus may result in inaccuracies in training a model. Further, thousands of training images are typically employed to train even a single model. These limitations limit availability of localization and thus availability to support other image functionality.