The present technology relates to an image processing apparatus, an image processing method, a program, and a recording medium; and more particularly, to an image processing apparatus, an image processing method, a program, and a recording medium configured to execute learning image processing.
For example, pattern recognizing technologies used for fade recognition, object recognition the like have been suggested as currently commercially available techniques regarding learning image processing.
In the learning image, processing according to the related art, learning is executed after a label of a recognition target is granted to an abundance of image data as learning data, when a model of the recognition target is learned to form a recognizer.
For example, when an image is learned for face recognition, it is necessary to grant, as a label, information or the like used for specifying the name of a person, the orientation of his or her face, and a region where the facial image of the person is displayed. Further, when an image is learned for object recognition, it is necessary to grant, as a label, information or the like used for specifying the name of an object, the orientation of the object, and a region where the object is displayed.
Since the granting of the label is executed manually, it is difficult to prepare an abundance of learning data.
Accordingly, for example, there have been suggested learning image processing techniques of automatically learning the target model from a plurality of images including a moving image without granting a label as described above.
A method of automatically learning a foreground and background learning model has been suggested as an example of the leaning image processing in which the target model is automatically learned from a plurality of images including a moving image (for example, see “Unsupervised Learning of Multiple Aspects of Moving Objects from Video” by Michalis K. Titsias, Christopher K. I. Williams in Panhellenic Conference on Informatics 2005: 746-756).
Further, a method of automatically learning a multi-view target model corresponding to a foreground has been suggested as another example of the leaning image processing in which the target model is automatically learned from a plurality of images including a moving image (for example, see “Learning a dense multi-view representation for detection, viewpoint classification and synthesis of object categories” by H. Su, M. Sun, L. Fei-Fei and S. Savarese in International Conference on Computer Vision (ICCV), 2009). A geometric relation between a plurality of views is modeled according to this method.