Object detectors detect objects in images, such as with an adaptive model (e.g., a machine learning model, neural network, and the like) that has been trained with a dataset of images. Performance of the object detectors is usually limited to detection of objects belonging to categories that are included within the dataset of images used to train the adaptive model, referred to as “seen” classes or categories. Hence, an object detector may be able to detect a dog in an image, but unable to detect a tree in the image, when the training dataset includes a dog category and not a tree category.
In some cases, object detectors bias their detection results towards seen classes, such as an object category of a training dataset that is closest to a target category. For instance, an object detector may detect a fox in an image as a dog when the training dataset includes a dog category and not a fox category.
Furthermore, some object detectors, such as zero-shot detectors, may try to transfer knowledge from object categories of a training dataset (e.g., seen classes) to object categories not included in the training dataset (e.g., unseen classes). However, these object detectors are limited to transferring knowledge from seen to unseen classes strictly for classification purposes, rather than object detection and region proposal purposes. Hence, these object detectors often fail to detect regions of images for objects of unseen classes.
To overcome these shortcomings of object detectors, a training dataset of images could be scaled to include additional categories (e.g., tens of thousands of seen classes). However, scaling a training dataset of images can be prohibitive, in terms of cost and time. For instance, images need to be annotated to include the additional categories, and objects in the images need to be determined that correspond to the additional categories, requiring significant manual user effort. Moreover, ambiguities in labelling of certain categories exacerbate the difficulty of scaling a training dataset to include additional categories or labels, such as whether to assign a “banana” label to bunches of bananas, or strictly to an individual banana. Consequently, performance of object detectors remains limited by the seen classes of a training dataset used to train the object detector.