1. Field of the Invention
The present invention relates to image processing for detecting an object region of a principal subject.
2. Description of the Related Art
A technique for searching for similar objects in images using local feature amounts of images is known. In such technique, feature points (to be referred to as “local feature points” hereinafter) are extracted from images. Then, based on the local feature points and image information in the vicinity of these points, local feature amounts corresponding to the local feature points are calculated. A similar object search in images is conducted by matching of local feature amounts.
A local feature amount is generally defined as information including a plurality of elements having a rotation invariance and scaling invariance. Therefore, even for a rotated or enlarged/reduced image, a search using local feature amounts can be conducted. In general, a local feature amount is expressed as a vector. However, a local feature amount has the rotation invariance and scaling invariance theoretically. Hence, in actual digital images, local feature amounts suffer slight variations before and after rotation or enlargement/reduction processing of the images.
In order to extract rotation-invariant local feature amounts, a method of calculating a major direction from a pixel pattern of a local region around local feature points, and rotating the local region with reference to the major direction to normalize a direction upon calculation of local feature amounts has been proposed. Also, in order to calculate scaling-invariant local feature amounts, a method of generating images of different scales inside an apparatus, and performing extraction of local feature points and calculation of local feature amounts from the images of the respective scales has been proposed. A set of a series of images having different scales, which are generated inside the apparatus, is generally called “scale space”.
Since it is important to detect a principal subject from an image upon browsing images and for a search using metadata of a principal subject, a function of detecting or discriminating a face in an image is included in products and software in recent years. As for an object expected as an image recognition target like “face” in this case, a subject is recognized from an image to obtain its metadata. Also, an importance level of a subject can be determined by an occupied size of the subject in an image, appearance of the subject, and the like.
Also, in recent years, an environment which allows easy integration of quantatively and qualitatively rich data called big data is beginning to create, and the demand for data mining processing using it is increasing. Even in image data, mining processing can be performed for the aforementioned object which allows the image recognition.
However, a technique for detecting a principal subject from an image is effective for an object expected as an image recognition target, but it is very difficult to detect an object which is not an image recognition target, and it is also difficult to measure an importance level of such object. That is, mining based on big data is applicable to an object which allows image recognition, but is hardly applicable to an object which is not an image recognition target.
A technique for clustering images on a multi-dimensional feature space based on features of the entire image is known. This clustering technique may be applied to mining. However, the clustering technique is based on the similarities of entire images, but it is not based on image contents and, especially, a subject. That is, clustering is made based on features of entire images, and one image belongs to one cluster. Normally, an image includes a plurality of objects, and unique clustering for one image in the clustering technique cannot be adapted for each of the plurality of objects.
In consideration of a plurality of photo images shot by a person or those shot by many people, it is easily estimated that shooting frequencies of subjects which are considered to be important by photographers become high. For example, upon taking pictures of scenic and historic places in travel, photographers normally make a plurality of shooting operations while changing field angles and shooting positions. When pictures of a scenic and historic place of a certain spot are taken by a plurality of persons, it is easily estimated that shooting frequencies for respective objects are related to popularity and topicality. For such estimation, a subject commonly shot in a plurality of images is required to be detected, and a detection technique for this purpose is important for mining of big data, especially, images.
Also, as a technique for detecting an identical object from a plurality of images or videos, a technique for tracing an object in a moving image is known. Also, disparity image processing, panorama stitch processing, and the like for three-dimensional data are known.
The technique for tracing an object is premised on that designated images (frame images) include a common object, and is not a technique for determining whether or not designated images include a common object. For example, since object tracing in a moving image uses image blocks in the vicinity of an object as a feature search target, when a discrete still image sequence or a still image sequence shot from different angles is processed, an object fails to be traced, and a common subject cannot be found.
Also, as for the disparity image processing, since two images having a disparity inevitably includes a common subject, processing for calculating corresponding points within a region range generated by the disparity need only be executed. As for the panorama stitch processing, a common subject is required to be detected for each combination of images if there is no restriction on processing. However, restrictions required to simplify processing and to improve the processing precision (for example, a shooting order is defined) are normally set.