Field of the Invention
The present invention relates to an image processing apparatus, an image processing method and a program for achieving the image processing method.
Description of the Related Art
As one of image recognizing methods, there is a method of dividing an image shot and obtained (hereinafter, called a shot image) into a plurality of areas and identifying a class concerning classification of an object for each of the divided areas. In this method, the class for each area is identified based on the feature amount extracted from the image of each area. To appropriately divide the shot image into the areas facilitates many kinds of image processes such as a process to be performed when it is recognized what kind of object or shooting condition (scene) was shot, a process to be performed when image quality is corrected according to an object, and the like.
Here, as the method of dividing an image into areas, R. Socher, “Parsing Natural Scenes and Natural Language with Recursive Neural Networks”, International Conference Machine Learning 2011 (Non-Patent Literature 1) discloses the technique of dividing an input image into small areas called superpixels (SPs) based on color information and texture information. In the technique disclosed in the Non-Patent Literature 1, the class of each small area is identified by using an identifier called RNNs (Recursive-Neutral-Networks).
Moreover, P. Krahenbuhl, “Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials”, Neural Information Processing Systems 2011 (Non-Patent Literature 2) discloses the technique of simultaneously performing area division and class identification by using a conditional random field (CRF). In the technique disclosed in the Non-Patent Literature 2, the class of each pixel is identified not only based on the feature extracted from each pixel but also in consideration of class co-occurrence between adjacent pixels. Namely, in the relevant technique, the pixel that it is difficult to recognize alone because the feature thereof is obscure is identified in consideration of the relation with peripheral pixels. More specifically, in the relevant technique, each pixel is set as a node, the energy (unary potential) of the node and the energy (pairwise potential) between the nodes are defined, and the total sum of the defined energies in the whole of an image is minimized. Then, the class label of each pixel by which the energy is minimized is given as an identification result.
On another front, a method called a bagging has been used as an effective method for an image recognizing task in the field of machine learning. Here, the bagging is the method of generating an identifier having higher discrimination accuracy by synthesizing discriminators generated by repeating bootstrap sampling. Besides, a method of performing identification of high discrimination accuracy by generating a large number of identifiers (ensemble identifiers) and integrating the generated identifiers has been studied. Moreover, the technique disclosed in Japanese Patent No. 4623387 achieves high-accuracy identification by generating a large number of ensemble identifiers, selecting the high-accuracy identifiers from among the generated ensemble identifiers, and integrating the selected identifiers.
In addition, each of P. Felzenszwalb, “Efficient Graph-Based Image Segmentation”, International Journal of Computer Vision 2004 (Non-Patent Literature 3) and S. Lazebnik, C. Schmid and J. Ponce, “Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories”, CVPR 2006 (Non-Patent Literature 4) discloses the technique of dividing a shot image into small areas called superpixels (SPs). The Non-Patent Literature 4 also discloses the technique of recognizing a shot scene by using a feature amount called “Spatial Pyramid Matching Kernel”. Besides, A. Oliva and A. Torralba, “Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope”, International Journal of Computer Vision, 2001 (Non-Patent Literature 5) discloses the technique of recognizing a shot scene by using a feature amount called the GIST feature amount. Besides, H. Bay, “SURF: Speeded Up Robust Features”, Computing Vision and Image Understanding, 2008 (Non-Patent Literature 6) discloses the technique of using a local feature amount obtained from a learning image.
Here, in the above conventional techniques, one identifier previously generated using the learning image identifies the class of each area in the shot image. That is, the feature amount extracted from each area is input to the one identifier generated by the learning, and the relevant identifier identifies the class of each area by using the input feature amount.
However, in some shooting conditions (scenes), there is a possibility that the class to be normally identified as another class is identified as a similar class, whereas there is a possibility that the class to be normally identified as the similar class is identified as another class. For example, a case where the object is a cloud is assumed. In this case, the cloud shot in the daytime is white, whereas the cloud shot in the afterglow of the sunset is orange because of a reflection caused by the sun. In such situations, the orange cloud image shot in the evening sunlight and, e.g., an orange texture-full wall image shot in the daytime are similar to each other in an image feature space. Consequently, in order to generate an area discriminator (identifier) by using various learning images, for example, if the evening-sunlight image and the orange wall image are simultaneously learned, there is a case where these images are erroneously learned as the similar images. In this case, the identifier erroneously identifies the evening-sunlight image and the orange wall image as the similar images respectively. Then, it becomes difficult to separate and identify the images identified as the similar images.
Therefore, the present invention aims to provide an image processing apparatus, an image processing method and a program which can accurately identify, even for various images of which the shooting conditions (scenes) are different, a class concerning an object classification for each area of the images.