Visual saliency is a very important part of human vision: it is the mechanism that helps in handling the overload of information that is in our visual field by filtering out the redundant information. It can be considered as a measure of the extent that an image area will attract an eye fixation. Unfortunately, little is known about the mechanism that leads to the selection of the most interesting (salient) object in the scene such as a landmark, an obstacle, a prey, a predator, food, mates etc. It is believed that interesting objects on the visual field have specific visual properties that make them different than their surroundings. Therefore, in our definition of visual saliency, no prior knowledge or higher-level information about objects is taken into account. Because it includes a detailed visual processing front-end, saliency detection has wide applicability to computer vision problems, including automated target detection in natural scenes, smart image compression, fast guidance of object recognition systems, and even high-level scene analysis with application to the validation of advertising designs.
Prior art methods for identifying salient objects in a digital image generally require a computationally intensive search process. They also typically require that the salient objects be homogeneous regions.
In the articles “Computational modeling of visual attention” (Nature Reviews, Vol. 2, pp. 194-203, 2001) and “A model of saliency-based visual attention for rapid scene analysis” (IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 20, pp. 1254-1259, 1998) Itti et al. teach how to compute low-level computer vision features and then average all responses at different image scales. They compute low-level features of images such as color contrast, edges, and edge orientations at different scales using up and down sampling. They then compute center surround responses at the different scales using differences of Gaussians, and take the local maxima response. Finally, they combine all of the computed responses and generate a saliency map. The saliency map does not provide concrete boundaries around salient regions.
U.S. Pat. No. 6,282,317 to Luo et al., entitled “Method for automatic determination of main subjects in photographic images,” discloses a method for automatic determination of main subjects in photographic images. The method provides a measure of belief for the location of main subjects within a digital image, and thereby provides an estimate of the relative importance of different subjects in an image. The output of the algorithm is in the form of a list of segmented regions ranked in a descending order of their estimated importance. The method first segments an input image, and then groups regions into larger segments corresponding to physically coherent objects. A saliency score is then computed for each of the resulting regions, and the region that is mostly to contain the main subject is determined using probabilistic reasoning. However, one of the shortcomings of this approach is that image regions that constitute a main subject are not necessarily coherent with each other. For example if the main subject is a person wearing a red shirt with black pants, region merging will generally not combine the two regions.
U.S. patent application Publication 2008/0304740 to Sun et al., entitled “Salient object detection,” discloses a method for detecting a salient object in an input image. With this approach, the salient object is identified using a set of local, regional, and global features including multi-scale contrast, center-surround histogram, and color spatial distribution. These features are optimally combined through conditional random field learning. The learned conditional random field is then used to locate the salient object in the image. Image segmentation can then be used to separate the salient object from the image background.
Hou et al., in an article entitled “Saliency detection: a spectral residual approach” (IEEE Conference on Computer Vision and Pattern Recognition, pp. 1-8, 2007) describe a method to approximate the “innovation” part of an image by removing a statistically redundant component. The method involves performing center-surround (high pass) filtering of log spectral magnitudes. This approach tends to detect small salient regions well, however it does not perform as well for large regions since they generally carry redundant components inside the region boundaries.
Achanta et al., in an article entitled “Frequency-tuned salient region detection” (IEEE Conference on Computer Vision and Pattern Recognition, pp. 1597-1604, 2009), describe a salient region detection method that produces full resolution saliency maps with well-defined boundaries of salient objects. The method involves computing a mean color for the entire image and then subtracting the mean color from each pixel value to produce a saliency map. The method segments the image, determines a saliency response for each mean-shift segmented region and collects segmented regions that exceed an adaptive threshold. This approach is not capable of detecting a local salient region if the mean color of the local salient region is similar to that of entire image.
Donoser et al., in an article entitled “Saliency driven total variation segmentation” (IEEE International Conference on Computer Vision, pp. 817-824, 2009), introduce an unsupervised color segmentation method. The underlying idea involves segmenting the input image several times, each time focusing on a different salient part of the image and to subsequently merge all obtained results into one composite segmentation. The method identifies salient parts of the image by applying affinity propagation clustering to efficiently calculated local color and texture models. Each salient region then serves as an independent initialization for a figure/ground segmentation. Segmentation is done by minimizing a convex energy function based on weighted total variation, leading to a global optimal solution. Each salient region provides an accurate figure/ground segmentation highlighting different parts of the image. These highly redundant results are combined into one composite segmentation by analyzing local segmentation certainty.
Valenti et al., in an article entitled, “Image saliency by isocentric curvedness and color” (IEEE International Conference on Computer Vision, pp. 2185-2192, 2009) propose a novel computational method to infer visual saliency in images. The method is based on the idea that salient objects should have local characteristics that are different than the rest of the scene, the local characteristics being edges, color or shape. By using a novel operator, these characteristics are combined to infer global information. The resulting global information is used as a weighting for the output of a segmentation algorithm so that a salient object in the scene can be distinguished from the background.
There remains a need for a computationally efficient method to determine object saliency in a digital image that can work with both homogeneous and non-homogeneous image regions having a wide range of shapes and sizes.