1. Field of the Invention
The present invention relates to an information processing apparatus and an information processing method.
2. Description of the Related Art
Conventionally, as machine learning technology for pattern recognition, there is a technique called an ensemble learning method for identification with high precision by performing identification using a plurality of identification discriminators and integrating the identification results (see for example, the U.S. Pat. No. 6,009,199). The principle of identification according to the ensemble learning method is that even with discriminators with a large estimation variance value (weak discriminators), if a plurality of discriminators (weak discriminators) are collected, and identification is performed by majority decision, variance of estimated values becomes small. Examples of an ensemble learning technique include techniques such as bagging, boosting, and random forest.
In image recognition based on the ensemble learning technique, as an input to a weak discriminator, a basis function for feature amount conversion of a certain type is often used, such as the Haar basis function, (see P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” Proc. of CVPR and 2001: Non-Patent Document 1 below) and matching with a local patch (see A. Torralba, K. Murphy, W. Freeman, Sharing visual features for multiclass and multiview object detection, IEEE Trans. on Pattern Analysis and Machine Intelligence, Vol. 29, no. 5, pp. 854-869, 2007: Non-Patent Document 2), for example. In many ensemble learning techniques, each weak discriminator selects a different basis function, and discriminates and learns learning data in space secured by the basis function. At that time, a sparse basis function that refers to only a portion of an image or so-called integral image (Non-Patent Document 1), for instance, is used in many cases. Since the number of times of referring to data can be suppressed by using “integral image”, high-speed discrimination can be realized.
Note that if feature amount conversion functions are a small finite set, an optimal basis function to be given to a weak discriminator can be searched for in a round robin manner. However, generally, the source of conversion functions is uncountable and infinite, and thus a round robin search cannot normally be attempted. At such a time, conversion functions are randomly selected in an appropriate range so as to determine a basis function. Specifically, approximation is performed using a method in which a pool of bases where sufficient candidates for a conversion function are randomly selected is created, and an optimal basis function is selected from the pool, for instance.
Issues with pattern recognition application include a task of identifying a slight difference in the orientation or shape of a target object. For example, when assembly of industrial products is automated, a demand for identification of components based on whether or not a small notch is provided may arise. Further, for picking of components performed by a manipulator, there may be a demand for distinguishing between differences in orientation even with identical components. Here, a task of identifying patterns with a slight difference in space as described is called a problem of identifying classes that are “similar but not the same” (space used here includes not only two-dimensional space of an image, but also certain feature space).
Also, if improvement in precision is pursued in an issue of recognizing classes that are not explicitly “similar but not the same”, it will be necessary to appropriately identify a target in such a class to some extent.
If “similar but not the same” class identification is performed by ensemble learning, in a method of selecting an optimal basis function in a round robin manner, overfitting to noise, not to an actual difference, may occur, and a truly effective basis may not be selected. Further, in a method of randomly selecting bases, there is a possibility that an effective basis may be overlooked or only an insufficient number of bases can be selected. In the ensemble learning method, due to the characteristics of that technique, a generalization error will not be reduced unless there are sufficient variations of effective weak discriminators (weak discriminators whose expected value of a discrimination accuracy rate is greater than 0.5). Accordingly, it is desirable to use a pool including as many candidates for a basis as possible and as many weak discriminators as possible. However, in that case, a problem of an increase in the amount of calculation and the amount of data during learning and detection arises.