1. Field of the Invention
The present invention relates particularly to an image processing apparatus, an image processing method, and a program that are suitably used for classifying an image into a plurality of classes.
2. Description of the Related Art
In the past, research has been conducted on segmentation of an image into a plurality of meaningful regions (for example, see “The PASCAL Visual Object Classes (VOC) challenge” by M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, International Journal of Computer Vision. vol. 88 (2), 2010). The process typically uses a method for, first, dividing an image into small regions of superpixels including a plurality of adjacent pixels, and extracting feature amounts from the respective segmented regions. Then, the regions are integrated according to the extracted feature amounts, and the integrated regions are classified into respective categories. For example, according to a method discussed in “Parsing Natural Scenes and Natural Language with Recursive Neural Networks” by Richard Socher, Cliff Lin, Andrew Y. Ng, and Christopher D. Manning, ICML 2011, each region is classified into a class for sky, trees, or roads by a neural network learned in advance.
On the other hand, clustering and graph representation is used as a method for segmenting an image into superpixels as pre-processing (for example, see “SLIC Superpixels” by Radhakrishna Achanta, Appu Shaji, Kevin Smith, Aurelien Lucchi, Pascal Fua, and Sabine Susstrunk, EPFL Technical Report 149300, June 2010, and “Efficient graph-based image segmentation” by Felzenszwalb, P., Huttenlocher, D., International Journal of Computer Vision. 2004).
However, for example, segmenting an image into superpixels using a method discussed in “SLIC Superpixels” or “Efficient graph-based image segmentation” mentioned above, and classifying, using the superpixels which have been segmented, the image into meaningful regions by a method discussed in “Parsing Natural Scenes and Natural Language with Recursive Neural Networks” has yet a problem to be addressed as follows. For example, if an image is coarsely segmented into superpixels, a boundary between objects cannot be accurately extracted. On the other hand, if an image is finely segmented into superpixels to achieve high boundary accuracy, the amount of following processing is increased because the number of superpixels is great.