1. Field of the Invention
The present invention relates to an image processing apparatus, an image processing method, and a storage medium.
2. Description of the Related Art
As one of image recognition methods, there is provided a method in which a class relating to classification of an object is identified in each of segmented regions by segmenting a captured image into a plurality of regions. In this method, a class of each region is identified based on a feature quantity extracted from the region. By properly segmenting the image into regions, various kinds of image processing such as recognition of an object or a scene and correction of image quality according to the object can be easily executed.
According to a technique discussed in R. Socher, “Parsing Natural Scenes and Natural Language with Recursive Neural Networks”, International Conference on Machine Learning 2011, an input image is segmented into small regions known as super pixels based on color information and texture information. In the technique, classes of the segmented small regions are identified by using classifiers known as recursive neural networks (RNNs).
According to a technique discussed in P. Krahenbuhl, “Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials”, Neural Information Processing Systems 2011, region segmentation and class identification are simultaneously executed by using a conditional random field (CRF). In the technique, the class of each pixel is not only identified based on the features extracted from the pixels, but also identified by taking co-occurrence of classes in adjacent pixels into consideration. In the technique, with respect to a pixel having unspecific features which is difficult to be recognized independently, identification thereof is executed by taking a relationship between that pixel and pixels in its neighbor into consideration. More specifically, in the technique, a summation of pixels in an entire image is minimized by taking each of the pixels as a node to define energy of the node (unary potential) and energy between the nodes (pairwise potential). Then, in the technique, a class of each pixel for minimizing the energy is identified.
In the above-described two techniques, information used for region segmentation and class identification is acquired from an image (feature quantity). However, there is provided another technique in which the region segmentation is executed by using information other than image that can be acquired at the time of capturing the image in addition to using the information that can be acquired from the image.
According to a technique discussed in U.S. Pat. No. 7,860,320, an estimation score for a class of an object of a region (super pixel (SP)) in an image is changed according to an area by using positional information through a global positioning system (GPS). For example, if the area thereof is an equatorial area, the positional information may be indicated as “NO SNOW”. Further, in the technique, time and direction the image has been captured are also used as the information, and a co-occurrence table of a spatial arrangement relationship between the classes of the object is changed according to the positional information.
According to a technique discussed in Japanese Patent Application Laid-Open No. 2012-4716, object distance information is acquired when an image is captured, so that the image is segmented into regions based on the object distance information. In the technique, scene determination is executed at each segmented region, so that image processing is executed at each segmented region based on a determination result thereof.
In a technique discussed in Japanese Patent Application Laid-Open No. 2011-253354, an image is segmented into a foreground region and a background region based on an optional object extraction result acquired from image segmentation and distance information that can be acquired when the image is captured. In this technique, for example, the image is segmented by taking a pixel with a distance shorter than a distance of a predetermined object as a foreground region while taking a pixel other than that pixel as a background region.
In the above-described conventional techniques, a class of each region in the image is identified by using a previously-trained classifier by taking a feature quantity extracted from the region as an input. However, with the above-described techniques, the class of each region cannot be precisely identified.