Object cutout is an important and fundamental issue in computer vision. The typical mechanism for performing the cutout or isolation of a visual object in an image is binary segmentation, in which every pixel in the image is assigned either a first value if it belongs to a foreground object or a second value if it belongs to the background. Depending on the particular process in operation, this binary labeling either originates from a segmentation boundary between visual foreground and background, or proceeds from a pre-existing segmentation boundary that is known beforehand. There are numerous conventional techniques for determining an optimal segmentation boundary for cutting out foreground objects.
In content-based image retrieval (CBIR), a query image is often used as an example to retrieve images with similar content. However, in most cases, conventional retrieval techniques can only compute low-level features from the entire query image to represent the content of that image. High-level semantic information in the query image is mostly lost. Thus, the performance of conventional retrieval systems is often poor. One way to more closely represent the semantic content of a query image is to try to cutout the foreground object(s) in both the query image and the database images. However, such object cutout is still a challenging problem.
Existing bottom-up approaches for segmenting a general collection of images can hardly achieve semantic segmentation, since they mainly aggregate pixels into segments according to low-level features such as uniformity of color, texture, or smoothness of bounding contours. User interaction can greatly improve segmentation results, but to segment vast numbers of images through user interaction is prohibitively expensive in large databases, such as the CBIR image database. What is needed is an accurate and robust way to automatically apply segmentation results from a query image or a few query images to infer segmentation results that can be propagated to segment a large collection of images. Then, through progressive propagation, a small number of user operations would be able to achieve segmentation of numerous images.
Conventional methods that try to propagate segmentation results from one to many images have severe limitations. For example, some require numerous training images for each type of image category, which is usually not possible. Others require both the foreground and background of sample images and test images to be highly similar. When there is a slight change in the illumination of a face, or a change in shape or shadow, these conventional methods fail. Few natural images can satisfy the stringent similarity requirement of these conventional techniques. Other conventional methods are simply too slow, even when a slow process is expected, requiring intensive processing that is too complex to be practical for such applications as image retrieval or video cutout. Still other conventional methods require that two images have strikingly different backgrounds in order to propagate segmentation across images.