Visual attention is a significant mechanism for facilitating human visual system (HVS) to accurately and efficiently identify a scene. Capturing a salient region within an image by a computation method is an important research subject in the field of computer vision. It can help an image processing system to appropriately allocate computational resources in the subsequent processing steps. Saliency maps have been widely used in various computer vision applications such as object-of-interest image segmentation (see Chinese Patent Application Nos. 200910046276, 200910081069), object recognition, adaptive image compression, content-aware image resizing (see Chinese Patent Application No. 200910092756), and image retrieval (see Chinese Patent Application No. 200910081069).
Image visual saliency detection is one of the subjects that have attracted much attention of researchers. In theoretical researches on visual attention, the visual attention is classified into two types: fast, task independent (pre-attentive), data driven saliency detection; and slower, task dependent, goal driven saliency detection. The method according to one or more embodiments of the present invention relates to the former type. As is clear from the physiological researches, a human's visual cells respond preferentially to high-contrast stimulus in their receptive fields. In most of the existing researches on data driven visual saliency detection, visual saliency is calculated by computing contrast between image contents and scenes in various forms. For ease of explanation, the researches on visual saliency detection are further classified into two sub-types: a local contrast based method; and a global contrast based method.
The local contrast based method computes the saliency by the rarity of image regions with respect to relatively small local neighborhoods. Itti et al. proposed “A model of saliency-based visual attention for rapid scene analysis” (IEEE TPAMI, 20(11): 1254-1259, 1998) in 1998. This method introduces that the image saliency could be defined using central-surrounded differences across multi-scale image features. Further, Ma and Zhang proposed “Contrast-based image attention analysis by using fuzzy growing” (In ACM Multimedia, pages 374-381, 2003) in 2003. This method uses a local contrast analysis to generate saliency maps. Liu et al. proposed “Learning to detect a salient object” (IEEE TPAMI, 33(2): 353-367, 2011) in 2007. This method finds an optimal combination of weighted values for saliency detection methods of, for example, color space distribution, multi-scale contrast and central-surrounded histogram differences by learning manner. Goferman et al. modeled low-level clues, global considerations, organization rules, and high-level features in their work “Context-aware saliency detection” (In CVPR, 2010) in 2010. Results of these local contrast based methods generally produce higher saliency values near edges of objects instead of uniformly highlighting entire visual-salient objects.
On the contrary, the global contrast based method evaluates the saliency of an image region by measuring the difference between this image region and the entire image. Zhai and Shah proposed “Visual attention detection in video sequences using spatiotemporal cues” (In ACM Multimedia, pages 815-824, 2006) in 2006. This method calculates a saliency value of a pixel using luminance difference between the pixel and all the other pixels. However, in consideration of efficiency, this method used only luminance information of an image, thus ignoring distinctiveness clues in other color channels. Achanta et al. proposed “Frequency-tuned salient region detection” (In CVPR, pages 1597-1604, 2009) in 2009. This method obtains the saliency of each pixel using the pixel's color difference from the average image color. The simple approach, however, is insufficient to effectively analyze complex and varied natural images.
There is an existing Chinese Patent Application in this art, namely “A learning-based automatic detection method for a sequence of salient objects in videos” (Chinese Patent Application No. 200810150324). This method generally takes several seconds to process an image, so that it is hard to satisfy the needs for many real-time processing applications.