When an image or scene is captured on a camera or provided on some other electronic device or computer as a digital image, it can be desirable to segment objects in the image from each other and a background in the image. This may be performed to change the appearance of the image by augmenting the objects and/or background whether for artistic purposes, entertainment, or practical reasons, such as to use color pop that highlights a foreground object by displaying the foreground in color while showing the background in black and white. Segmentation also may be performed for operational reasons so that a user can select an object with a cursor to activate some action or move the object. Segmentation also may be performed where determining the shape of an object is important as with medical imaging. Otherwise, segmentation may be used as a tool for a number of downstream image or vision processing tasks such as encoding efficiency where less detail is compressed for a background of an image versus the foreground, or pattern recognition and/or clustering of data for further image processing tasks. This may include using the segmentation as a tool for object detection. Object detection is used to identify objects in the scene so that what the object is, for example a television or a chair, is understood. The object detection is performed so that some action can be performed depending on the identification of the object, such as with mixed or augmented reality electronic games, artificial intelligence (AI) systems, or any other application that operates the identified object or is used to modify the appearance of the identified object on the captured image depending on the identity of the object.
The segmentation often includes finding regions on the image with similar colors or patterns determined by examining color or luminance gradations of the pixel data. The object detection then attempts to classify these regions of interest (ROIs). For both segmentation and object identification, these operations can be relatively slow, causing delay and often have an insufficient accuracy resulting in a less than desirable user experience. This is particularly true in real-time image applications such as mixed reality (MR) where virtual objects are placed into a view of the real world, or augmented reality (AR) applications where objects in real world views are modified or warped.
In addition, object detection uses machine learning methods, and one example of such a method uses deep neural networks (DNNs) to classify the ROIs in images inputted to the DNN. This often involves use of cumbersome training techniques that require manual identification of objects in the images, by drawing a bounding box around the object in the image, and where the programmer must label the objects being identified (television, DVD player, etc.) by manually annotating the ROI image. Thus, such training cannot be performed during the run-time of the object detection system so that the object detection cannot be adaptable to newly encountered objects in an image during run-time. Object detection using a DNN also may be limited to the quality of the dataset available and its variance. DNNs typically often fail for corner cases (multiple simultaneous extreme variable levels) that the network has not learned before.