Sometimes, people would like to know a name of a flower seen at hills and fields or roadside. Hence, a technique is proposed in which many partial features of subject flowers and leaves are extracted from digital images of the flowers and leaves obtained with a digital camera or the like using a clustering method, and the information in which the extracted characteristic group is expressed in histogram-like manner is used as characteristic quantities. In this technique, one or more characteristic quantities are calculated, and the calculated characteristic quantities and characteristic quantities of various plants pre-registered in a database are analyzed by using a statistical method to identify kind of a wild plant (refer to Unexamined Japanese Patent Application Kokai Publication No. 2002-203242, for example).
A technique is known in which an image including a major photographic subject such as a flower is divided into a region of flower which is the major photographic subject and a region of background image by the Graph Cuts method (refer to Unexamined Japanese Patent Application Kokai Publication No. 2011-035636, for example).
When performing an image classification on an input image, such as an image of a flower, according to a machine learning, it is possible to easily realize an identifier with two classes which performs image classification with two patterns that the image is an image per se of some kind or others. On the other hand, when performing an image classification with multi classes, which identifies several kinds of images among a plurality of kinds of images, an identifier with multi classes are generally constituted by combining the identifiers of two classes. For example, when images of flowers are classified into six kinds, six identifiers with two classes are generated. Each identifier is generated so as to output the greatest identification score value when an image of a kind assigned to each identifier is input. Then, when an image is input into each identifier, a kind corresponding to the identifier which outputs the highest identification score value is used as an identification result. As related art literature of the clustering method, there are Unexamined Japanese Patent Application Kokai Publication No. 2003-016448, Unexamined Japanese Patent Application Kokai Publication No. 2007-026098, and Unexamined Japanese Patent Application Kokai Publication No. 2005-267607, for example.
A technique described in Unexamined Japanese Patent Application Kokai Publication No. 2003-016448 has an object to provide a system for segmenting images into coarse regions such as foreground and background, and deriving a measure of the total similarity from the similarity between the foreground and background regions. An event clustering method according to this technique uses foreground and background segmentation for clustering images from a group into similar events. Initially, each image is divided into a plurality of blocks, thereby providing block-based images. Utilizing a block-by-block comparison, each block-based image is segmented into a plurality of regions including at least a foreground and a background. One or more luminosity, color, position, or size features are extracted from the regions and the extracted features are utilized to estimate and compare the similarity of the regions including the foreground and background in successive images in the group. Then, a measure of the total similarity between successive images is computed, thereby providing image distance between successive images, and event clusters are delimited from the image distance.
A technique described in Unexamined Japanese Patent Application Kokai Publication No. 2007-026098 discloses an additional information determination method of additional information concerning a recognition technique which determines a category to which each pattern belongs from a plurality of categories, based on the pattern recognition result of a pattern and the additional information associated with each pattern. The additional information determination method makes a computer execute the following processes: a process of acquiring a confusion matrix having each element constituted of each probability of being decided as belonging to each category including a true category when each pattern has been pattern recognized; a process for receiving target recognition performance; a code definition process for determining code definition so as to satisfy the target performance being input corresponding to the code of the additional information to be added to the pattern concerned for the true category of each pattern, by referring to the confusion matrix; and a process for outputting the code definition as additional information.
A technique described in Unexamined Japanese Patent Application Kokai Publication No. 2005-267607 provides a digital picture book system which searches descriptions of photographic subjects imaged by imaging means and provides the description to a user, and includes: imaging means which picks up an image; major photographic subject selecting means to select major photographic subjects from the image; feature extracting means to extract features of major photographic subjects; image database selecting means to select an image database which stores features of extracted kinds from a plurality of image databases which store descriptions of the photographic subjects associating with the plurality of mutually different kinds of features in the photographic subjects; and description searching means to search descriptions of the major photographic subjects from the selected image database.
For example, when constituting an identifier with multi classes for a classification of kinds of flowers, if an image includes an image of a kind of flower per se and an image of another flower which is very similar to the flower, a machine learning device is hard to identify. On the other hand, when the images of the flowers are the same kind but learned data thereof is slightly different from each other, there is a problem that an overfitting occurs in the conventional machine learning, and that these images are not identifiable.
FIG. 13 illustrates an example of overfitting. This drawing represents an identification border 1303 to identify a Kanto dandelion (Taraxacum platycarpum Dahlst; hereinafter, referred to as Kanto dandelion) and a common dandelion (Taraxacum officinale; hereinafter, referred to as common dandelion), and for the sake of simplification of description, illustrates that identification is performed on a two-dimensional characteristic information space formed by characteristic information 1 and characteristic information 2.
Originally, a Kanto dandelion and a common dandelion are close resemblance kinds, and it is difficult to identify the kinds only from shapes or directions of the overall flowers, and it is possible to identify the kinds when detail part of the calyx of the flower is observed. Under such situation, it is supposed that data for learning including two or more kinds is used, a positive data group 1301 which is correct answer data, and a negative data group 1302 which is incorrect answer data is classified, and a machine learning is caused to be performed.
In this case, with conventional machine learning device, priority is given only to classifying the positive data group 1301 and the negative data group 1302, and consequently, there are many cases where a machine learning device forcibly search for the difference with paying attention to difference in the images which is not essential as for a classification of the kinds of the flowers.
For example, if a flower just happened to face to the right illustrated as 1305 in the positive data group 1301 of the learning image, and a flower just happened to face to the left illustrated as 1306 in the negative data group 1302, a part 1304 of an identification border 1303 regarding the characteristic information 1 and the characteristic information 2 which is set in an identifier, is set on the basis of the direction of the flower such as facing to the right or facing to the left.
As a result, there is a problem that, in the boundary part 1304 on the identification border 1303, a Kanto dandelion and a common dandelion are no longer determined based on a correct identification criterion (difference of the calyx of flowers), and identification performance is decreased. The problem has not been solved by techniques described in related art literatures mentioned above.