1. Field of the Invention
The present invention relates to a pattern recognition technique.
2. Description of the Related Art
Conventionally, a method using a set (ensemble) of classification trees has been proposed. This is a technique for attaining higher recognition performance by generating L (L is a constant equal to or larger than 2) classification trees and using all of these trees.
Non-patent literature (Mustafa Ozuysal, Pascal Fua, Vincent Lepetit, “Fast Keypoint Recognition in Ten Lines of Code,” cvpr, pp. 1-8, 2007 IEEE Conference on Computer Vision and Pattern Recognition, 2007) has disclosed an example in which the method using the ensemble of classification trees to a computer vision, and the disclosed method is as follows.
Initially, one registered image undergoes slight deformations and noise additions to generate variation images of the registered image. That is, a plurality of variation images are generated for the registered image of one type.
Next, N reference point pairs each indicating two reference point positions on an image are generated (to be referred to as a reference point pair sequence hereinafter). Image luminance values in the reference point pair sequence are compared, and a comparison result sequence is expressed by a bit sequence of 0/1, thereby calculating an N-bit binary code from one image and one reference point pair sequence.
Then, N-bit binary codes are respectively calculated for the variation images, as described above, and probabilities with the registered image type corresponding to the binary codes are learned. This corresponds to one classification tree. Such learning using N-bit binary codes are executed while changing the reference point pair sequence L times. That is, L classification trees are generated.
In a detection mode, an N-bit binary code is calculated from an input image according to the positions of the N reference point pairs set in a learning mode. This calculation is executed for all the L different reference point pair sequences which are set in the learning mode. A product of probabilities of a registered image associated with the obtained L binary codes is calculated, and a registered image type with the highest probability is selected as a detection result.
According to this method, the processing in the detection mode can be practiced by a high-speed method including binary code conversion by comparison of pixel values of reference point pairs of an input image and dictionary table lookup using binary codes. For this reason, compared to classic recognition processing using classification trees, the processing can be speeded up very much. Also, the literature includes a report indicating that the recognition accuracy is sufficiently high.
According to the conventional technique, the following problems are posed. That is, a plurality of variation images have to be generated for one registered image for the purpose of learning, and a complicated operation sequence and heavy processing load are required in the learning mode. Also, when the number of variations is increased, the size of learning result information (dictionary) increases, and overloads a memory. Since the dictionary size increases according to a product of the number of types of images to be registered and the number of variations, a problem is especially serious when the number of types of registered image is large.
When the number of variations is limited to maintain a small dictionary size, detection performance lowers since learning has to be made using only a smaller number of learning images, thus posing another problem. For example, a feature amount as a pixel comparison result of two points of a reference point pair readily changes depending on illumination fluctuations, orientation fluctuations, and noise. Therefore, in order to attain robust recognition, a sufficiently large number of variation images have to be learned for one pattern type, resulting in an increase in dictionary size.