Field of the Invention
The present invention relates to a technique for automatically detecting a salient region in an image.
Description of the Related Art
Conventionally, in the field of image processing, there are known to be techniques according to which an image region that is predicted to be focused on by a person, or in other words, an image region that is to be given attention (referred to below as a “salient region”) is detected in (extracted from) an image. Also, by calculating saliency measures of pixels in the image using such a technique for salient region detection, creation of a saliency map showing the saliency measures of the pixels in the image is also performed.
This salient region detection technique is expected to be applied in a wide range of fields, such as monitoring cameras, robot vision, and machine vision as a fundamental technique for detecting a main object in an image or detecting an abnormality in an image.
Algorithms for salient region detection are broadly divided into model-based methods and learning-based methods. A model-based method is a method of detecting a salient region using a model which is obtained by modelling a model of a person's eye or brain or a hypothesis using a formula. The model-based method is superior in versatility, but no model according to which determination of a person's eye or brain can be perfectly reproduced has yet been constructed, and in order to address various real-world problems, further improvements in accuracy are desired.
On the other hand, a learning-based method is a method of using a large amount of exemplary (sample) data or supervised signals (ground-truth) to learn image features of a salient region (see JP 2001-236508A), and is advantageous in that it does not require a model or a hypothesis and a high-accuracy detector can be constructed more easily. However, in many cases, it is difficult to prepare exemplary data (a learning DB) that encompasses all patterns that are to be detected as salient regions, or the supervised signal (ground-truth) is difficult to define, which are bottlenecks in the practical application of the learning-based method.
As a conventional method in view of the foregoing points, JP 2010-258914A proposes a technique in which a salient region is detected using information between frames that constitute a video, without requiring prior knowledge. However, although the technique disclosed in JP 2010-258914A does not require prior knowledge, it is used only on moving images constituted by multiple frames, and it cannot be applied to evaluating the saliency measure of a still image.
JP 2001-236508A and JP 2010-258914A are examples of background art.