The exemplary embodiment relates to object localization and finds particular application in connection with a system and method which uses probability maps generated for similar images to localize a target object region for segmenting a selected image.
Object localization relates to determining the location of a target object in an image. Image segmentation approaches have been developed for locating specific types of objects in photographic images and are used in a number of business applications. For example, photographs of vehicles may be subjected to segmentation techniques to identify the region of the image which corresponds to a license plate. OCR or other text recognition techniques may then be applied to this region to identify the license number or to see if it matches another license plate.
Some existing segmentation techniques are based on heuristics which exploit the a priori known characteristics of the object to be segmented, such as characteristics of text. For example, some exploit the frequent presence of horizontal and vertical edges. See, for example, Wonder Alves, et al., “Text localization in scene images by morphological filters,” in SIBGRAPI, 2009, and Toan Dinh Nguyen, et al., “Tensor voting based text localization in natural scene images,” IEEE Signal Processing Letters, 17, July 2010. Others rely on high local contrast or constant stroke width. See, for example, Paolo Comelli, et al., “Optical recognition of motor vehicle license plates.” IEEE Trans. on VT, 44, November 1995; and Boris Epshtein, et al., “Detecting text in natural scenes with stroke width transform,” in CVPR, pages 2963-2970, 2010 (hereinafter, “Epshtein”). These techniques have rather narrow applicability, since the prior knowledge of the images of the task of interest is incorporated into the software, and therefore such methods do not generalize well to segmentation tasks other than the ones for which they have been designed.
One approach for object localization is described in copending application Ser. No. 13/351,038, filed on Jan. 16, 2012, entitled IMAGE SEGMENTATION BASED ON APPROXIMATION OF SEGMENTATION SIMILARITY, by José Antonio Rodríguez Serrano (hereinafter, “the '038 application”). There, the object location for an image is determined by first computing the similarity between the image and each of a set of database images in which a similar object of interest has been localized, and then the location information of the object in the most similar image is transferred to the input image or combining the locations for the top-K most similar images. In the case of license plates, for example, when the target text region occupies only a small portion of the image, a two-stage method of object localization, which is particularly useful for such situations, can be used, as described in U.S. application Ser. No. 13/592,961, filed Aug. 23, 2012, entitled REGION REFOCUSING FOR DATA-DRIVEN OBJECT LOCALIZATION, by Jose A. Rodriguez-Serrano, et al. (hereinafter, “the '961 application”).
Both of these applications use a data-driven approach for object localization which has advantages in terms of simplicity and speed. To compute the similarity, a representation of the image is computed, which allows similarity between images to be computed as the dot product between their representations. The features used to compute the image representations are generic (i.e., task-independent) image features, such as Fisher vectors, which have been shown to perform accurately across a range of computer vision tasks such as image categorization or retrieval. However, the image representation itself is not particularly tailored for a localization task. An improvement in the performance of the method can be achieved using similarities optimized for a localization task (by learning a metric), referred to as “task-dependent similarities,” as described in the '961 application, but the task-independent image features are maintained and the metric learning can be computationally expensive.
There remains a need for a system and method for segmentation which can utilize features tailored for the task of localization.