Along with improvement in processing performance, image recognition systems have come to be applied to a wide range of fields from a conventional field of factory automation (FA) to a field of monitoring of people indoor or outdoor, recognition of faces by a digital camera or the like, or recognition of the external world by a vehicle camera.
In particular, in recent years, systems have been become general that perform not only detecting and tracing of an object but also discrimination of a type of object (for example, discrimination of a normal behavior from an abnormal one in monitoring people, and discrimination of a sex in recognition of faces).
Image discrimination applications (hereinafter called discrimination applications) generally employ a classifier, such as a neural network or a support vector machine (SVM), because a discrimination object is not rigid and deforms or it has diverse looks.
When the classifier is used to perform image discrimination, numerous learning images (teaching images), which are necessary for the classifier to learn, have to be acquired. Conventional work of acquiring learning image has to be manually performed, requiring numerous man-hours.
For example, for discrimination of an image having 10×10 pixels (this resolution is needed to visually decide the texture or shape of an object), when each pixel is regarded as a discrimination feature, the number of dimensions of the feature is 100. In general, it is said that the number of learning data that is ten or more times larger than the number of feature dimensions is necessary to achieve stable discrimination using the classifier. In this case, 1000 images per class are required as the learning data (as the number of classes to be discriminated increases, the number of necessary images increases).
Incidentally, the class signifies a “correct value” or an “incorrect value” to be given to the classifier during learning of the classifier. For example, in a case of discriminating a sex of a person, classification information such as “male” for a male image or “female” for a female image correspond to the class. Further, depending on the type of classifier, both a correct image and an incorrect image have to be included in learning images. For example, in the case of discriminating a sex of a person, aside from the male image and female image, a background image has to be intentionally learned as a class “others.” In this case, the male image and female image are “correct images” and noise images including the background image are “incorrect images.”
In the case of motion picture processing that handles discrimination of a moving object, there is work of clipping a learning image from each frame (or at intervals of a processing cycle). Therefore, in addition to the problem of man-hours, a problem arises that a learning algorithm does not converge or discrimination performance is not stabilized because satisfactory clipping work quality cannot be maintained, that is, a learning image area is deviated from a desired area.
In order to cope with the problems concerning acquisition of learning images, Japanese Unexamined Patent Application Publication No. 7-21367 discloses a system that increases the number of quasi learning images by manipulating initial images (for example, rotating the image or superposing noise), which are acquired in advance, through image processing. Japanese Unexamined Patent Application Publication No. 2006-293528 discloses a method of mapping a group of learning images, which is acquired in advance, onto a feature space employed in discrimination, and helping decide whether the group of images is acceptable as learning images.
However, the conventional method does not decrease man-hours required for the preceding work of acquiring numerous images. For example, the technology disclosed in Japanese Unexamined Patent Application Publication No. 7-21367 can increase the number of quasi data as long as initial images that are not manipulated are available. However, work of acquiring the initial images is separately needed. In addition, if a manipulation pattern (noise or the like) employed in producing the quasi images is inconsistent with a pattern of change obtained during actual imaging, discrimination performance may be adversely affected.
Further, for example, according to Japanese Unexamined Patent Application Publication No. 2006-293528, visual selection work for verifying whether acquired images are suitable for learning can be efficiently performed, but work of acquiring images that become objects of selection cannot be efficiently performed. In addition, the method includes a mapping to the feature space. Therefore, an effect is expected in additional learning for which a type of discrimination feature is already determined. However, in a stage preceding determination of a feature type in the course of developing an algorithm, no effect is expected at the time of initial learning since a mapping destination space is not fixed.
In particular, when a non-rigid body such as a person is discriminated or when a large image distortion is produced at some position in an image by using a wide-angle lens camera, it is necessary to acquire quite diverse and numerous images as initial learning images. Reducing man-hours for the work is a significant problem.
Accordingly, an object of the present invention is to provide a classifier learning image production program, a classifier learning image production method, and a classifier learning image production system which are capable of efficiently performing work of acquiring learning images to be employed in development of a discrimination application, or more particularly, efficiently performing work of acquiring initial learning images to be employed in an early stage of development of a discrimination algorithm.