The human vision system can rapidly and accurately identify important regions in its visual field. In order to replicate this capability in computer vision, various saliency detection methods have been developed to find pixels or regions in an input image that are of the highest visual interest or importance. Often the “important” pixels/regions carry some semantic meaning, such as being part of an object (e.g., person, animal, structure, etc.) in the foreground of the image that stands out from the background of the image. Object level saliency detection can be used for various computer vision tasks, such as image summarization and retargeting, image thumbnail generation, image cropping, object segmentation for image editing, object matching and retrieval, object detection and recognition, to name a few.
Although the general concept of computing saliency of an input image seems logical and straightforward, saliency detection is actually quite difficult in the field of computer vision due to the inherent subjectivity of the term “saliency.” That is, the answer to the question of what makes a pixel/region of an image more or less salient can be highly subjective, poorly-defined and application dependent, making the task of saliency detection quite challenging.
Current techniques for detecting saliency in an image have tried to tackle the problem by using various “bottom-up” computational models that predominantly rely on assumptions (or priors) of the image relating to the contrast between pixels/regions of the image. That is, current saliency detection algorithms rely on the assumption that appearance contrast between objects in the foreground and the background of the image will be relatively high. Thus, a salient image pixel/patch will present high contrast within a certain context (e.g., in a local neighborhood of the pixel/patch, globally, etc.). This known assumption is sometimes referred to herein as the “contrast prior.”
However, detecting saliency in an image using the contrast prior alone is insufficient for accurate saliency detection because the resulting saliency maps tend to be very different and inconsistent among the various implementations using the contrast prior alone. In some cases, the interior of objects are attenuated or not highlighted uniformly. A common definition of “what saliency is” is still lacking in the field of computer vision, and simply using the contrast prior alone is unlikely to generate accurate saliency maps of images. FIG. 1 illustrates four example object-level saliency detection techniques as compared across three input images 100 and their corresponding ground truth salient object masks 102. As can be seen in FIG. 1, the techniques 104-110 produce resulting saliency maps that vary significantly between each other, even for a simple input image such as the image of the tomato shown at the top of column 100 in FIG. 1. The results of techniques shown in FIG. 1 demonstrates that using the contrast prior alone is insufficient for achieving suitable saliency maps of input images.