1. Field of the Invention
The present invention relates to a recognition device and method, and more particularly a recognition device and method which performs a computation on a target such as an image or voice to recognize whether it matches a predetermined reference.
2. Description of the Related Art
For recognition of a recognition target such as a substance or the face of a person with reference to its image, it can be recognized in principle by calculating a similarity of the input image of a substance with a template image of a previously stored reference.
But, the image of an actual recognition target is largely variable depending on environmental conditions such as a direction of the placed recognition target, a distance and lighting. Therefore, an enormous quantity of templates corresponding to image variations must be prepared. And, a computational quantity required for calculation of the similarity between the input image and the templates also becomes enormous.
Therefore, a method, which normalizes the input image to a position, inclination, size and the like predetermined by geometrical transformation or the like, is effective. Normalization allows the reduction of template images to be compared and the recognition processing in actual computing time.
As a normalization method, there is a known method which extracts feature points from the input image and applies the extracted feature points to a shape model of a prescribed normalization image so to normalize the input image. As a typical feature point extraction method, a method using an edge operator is known, but a clear edge may not be obtained when a substance has a smooth surface shape such as a face, and an edge is greatly susceptible to lighting conditions.
Meanwhile, a scholarly treatise “Rotation Invariant Neural Network-Based Face Detection” (H. A. Rowley, S. Baluja and T. Kanade, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, 1998, pp. 38-44) discloses a technique to detect a deviation from a normalized image directly from a light and dark pattern of an input image and uses the detected value to normalize the input image. According to the treatise, a tilted angle of the face is detected from a tilted face image by a neural net, the tilted angle detected is used to make the face image upright, and it is recognized whether the input image is a face image. This method can estimate an angle with resistant to a change in the input image by virtue of generalization ability of the neural net, and a normalized image can be obtained stably.
But, the technique described in the above treatise needs to accurately estimate the lean angle with respect to all angles. Therefore, it is necessary to learn all angles to a learning sample, and there are disadvantages that it is necessary to prepare a lot of learning samples and learning takes a long time.
Besides, the above treatise covers rotation in only an image surface as flexibility of deformation of an input image, but an actual image has a high degree of flexibility in a rotation in a depth direction, a size, a position, lighting and the like, resulting in a more serious problem.
In other words, learning samples varied independent of respective flexibility are required in order to accurately estimate a lot of flexibility at the same time, and an enormous quantity of learning samples is required as the product of the number of samples required for respective flexibility. Accordingly, it is impossible to complete learning in a realistic time.
Under the circumstances described above, the present invention provides a recognition device and method which can normalize the target such as an image or voice even if the target varies largely and can learn with ease.