1. Field of the Invention
This invention relates to a method for recognizing the presence or absence of a predetermined object image in an image. This invention particularly relates to a method for recognizing an object image wherein, during image information processing, a judgment is made as to whether a candidate for a predetermined object image, which candidate has been extracted from an image, is or is not the predetermined object image. This invention also relates to a learning method for a neural network, wherein a target object image, for which the learning operations are to be carried out, is extracted from an image, and the learning operations of a neural network for carrying out recognition of a predetermined object image are carried out with respect to the extracted target object image. This invention further relates to a method for discriminating an image wherein, during image information processing, a judgment as to whether a given image is or is not a predetermined image is made accurately without being adversely affected by a change in the angle of the image, rotation of the image and a background of the image.
2. Description of the Prior Art
A human being views an image and recognizes what the thing embedded in the image is. It is known that this action can be divided into two steps. A first step is to carry out "discovery and extraction" by moving the viewpoint, setting a target of recognition at the center point of the visual field, and at the same time finding the size of the object. A second step is to make a judgment from a memory and a knowledge of the human being as to what the object present at the viewpoint is. Ordinarily, human beings iterate the two steps and thereby acquire information about the outer world.
On the other hand, in conventional techniques for recognizing a pattern by carrying out image processing, typically in pattern matching techniques, importance is attached only to the second step. Therefore, various limitations are imposed on the first step for "discovery and extraction." For example, it is necessary for a human being to intervene in order to cut out a target and normalize the size of the target. Also, as in the cases of automatic reading machines for postal code numbers, it is necessary for a target object to be placed at a predetermined position. As pattern recognizing techniques unaffected by a change in size and position of a target, various techniques have been proposed wherein a judgment is made from an invariable quantity. For example, a method utilizing a central moment, a method utilizing a Fourier description element, and a method utilizing a mean square error have been proposed. With such methods, for the purposes of recognition, it is necessary to carry out complicated integrating operations or coordinate transformation. Therefore, extremely large amounts of calculations are necessary in cases where it is unknown where a target object is located or in cases where a large image is processed. Also, with these methods, in cases where a plurality of object images are embedded in an image, there is the risk that their coexistence causes a noise to occur and causes errors to occur in recognizing the object images. Thus these methods are not satisfactory in practice.
As a model, which is unaffected by the size of a target object or by a shift in position of a target object and which can accurately recognize the target object, a model utilizing a neocognitron, which is one of techniques for neural networks, has been proposed. The neocognitron is described by Fukushima in "Neocognitron: A Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position," Collected Papers of The Institute of Electronics and Communication Engineers of Japan, A, J62-A(10), pp. 658-665, October 1979. Neural networks constitute one of research techniques for neural information processing, which is referred to as the constructive method and which aims at clarifying the information processing principle of a brain by constructing an appropriate neural circuitry model with full consideration given to the facts known physiologically and results of research, investigating the actions and performance of the model, and comparing the actions and performance of the model with those of the actual human brain. Research has been conducted to develop various models, such as visual models, learning models, and associative memory models. In particular, the neocognitron model is tolerant of a shift in position of an object image embedded in an image. The neocognitron carries out pattern matching and self-organizing learning operations on a small part of a target object image, assimilates a shift in position at several stages with a layered architecture, and thereby tolerates the shift in position.
In the neocognitron, the operation for tolerating a shift in position of a feature little by little at several stages plays an important role in eliminating adverse effects of a shift in position of an input pattern and carrying out pattern recognition tolerant of a deformation of the input pattern. Specifically, adverse effects of shifts in position between local features of an input pattern, which shifts are due to various deformations, such as enlargement and reduction, of the input pattern, are assimilated little by little during the process for putting the features together. Ultimately, an output can be obtained which is free of adverse effects of comparatively large deformation of the input pattern.
As described above, the neocognitron is based on the principle that the pattern matching is carried out on a small part of a target object, and a shift in its position is assimilated at several stages through a layered architecture. However, with such a principle, a limitation is naturally imposed on achievement of both the accurate recognition and the assimilation of the shift in position. It has been reported, for example, by Nagano in "Neural Net for Extracting Size Invariant Features," Computrol, No. 29, pp. 26-31, that the neocognitron can ordinarily tolerate only approximately four times of fluctuation in size. As for the shift in position, the neocognitron can tolerate only approximately two or three times the size of a target object. The tolerance capacity remains the same also in a recently proposed neocognitron model which is provided with a selective attention mechanism.
How the visual function of a human being carries out the first step has not yet been clarified. On the other hand, how the viewpoint moves has been clarified to some extent as described, for example, by Okewatari in "Visual and Auditory Information Processing in Living Body System," Information Processing, Vol. 23, No. 5, pp. 451-459, 1982, or by Sotoyama in "Structure and Function of Visual System", Information Processing, Vol. 26, No. 2, pp. 108-116, 1985. It is known that eyeball movements include a saccadic movement, a follow-up movement, and involuntary movement. Several models that simulate these eye movements have been proposed. For example, a model in which the viewpoint is moved to the side of a larger differential value of an image is proposed, for example, by Nakano in "Pattern Recognition Learning System," Image Information (I), 1987/1, pp. 31-37, or by Shiratori, et al. in "Simulation of Saccadic Movement by Pseudo-Retina Mask," ITEJ Tec. Rep. (Technical Report of The Institute of Television Engineers of Japan), Vol. 14, No. 36, pp. 25-30, ICS' 90-54, AIPS' 90-46, June 1990. Also, a model in which the viewpoint is moved to the side of a higher lightness is proposed, for example, by Hirahara, et al. in "Neural Net for Specifying a Viewpoint," ITEJ Tec. Rep., Vol. 14, No. 33, pp. 25-30, VAI' 90-28, June 1990. Additionally, a model in which the viewpoint is moved to a point of a contour having a large curvature is proposed, for example, by Inui, et al. in Japanese Unexamined Patent Publication No. 2(1990)-138677. However, these proposed models are rather simple and do not well simulate the human visual function.
Also, for the purposes of finding a target of recognition and extracting a region including the whole target, instead of adhering only to local features of the target object, it is necessary that the movement of the viewpoint becomes stable (stationary) at the center point of the whole target. However, with the aforesaid conventional models, such an operation for stabilizing the viewpoint cannot be carried out. For example, with the model proposed by Shiratori, et al. wherein the pseudo-retina mask is utilized, the viewpoint moves forward and backward around the contour line of an object and does not become stable. Also, with the model proposed by Inui, et al., the viewpoint can ultimately catch only a feature point at a certain limited part of an object. Additionally, most of the aforesaid conventional models requires, as a tacit precondition, that the background of an object is simple. Thus most of the aforesaid conventional models cannot be applied to natural images, such as ordinary photographic images.
As described above, various techniques have been proposed which enables satisfactory recognition of a target in cases where a human being intervenes in order to assimilate a shift in position of the target or a change in the size of the target or in cases where the position and the size of the target are normalized in advance. However, no excellent technique has yet been proposed, with which the whole target object image can be extracted from an image for the purposes of recognizing the object image.
Further, research to develop models for carrying out search and recognition of objects has heretofore been considered as one of important techniques for image information processing and has been carried out in various manners. In particular, attempts have heretofore been made extensively to recognize face patterns, which serves as patterns of objects and are embedded in images.
Typical models utilizing faces as target objects include the following methods:
(1) A method wherein an eigenface obtained by analyzing the main components of a sample of a face image is utilized. The method is described by Matthew, T., Alex, P. in "Eigenfaces for Recognition," Journal of Cognitive Neuroscience, Vol. 3, No. 1, 1991, 71-86.
(2) A method wherein a square region, which has been cut out of a face image, is mosaicked, and thereafter a learning operation on the face image is carried out with a BP method, which is one of neural network techniques, the face image being thereby recognized. This method is described by Shin Kosugi (NTT Human Interface Laboratory) in "A Study of Face Image Recognition Using A Neural Network," ITEJ Tec. Rep., Vol. 14, No. 50, 1990.9, 7-12.
(3) A method wherein color information and KL development are utilized. This method is described by Tsutomu Sasaki (NTT Human Interface Laboratory), Shigeru Akamatsu, et al., in "Study of An Automatic Recognition Method for A Frontal Face Image," Shingiho, IE91-50, 1-8.
Also, the methods described below have heretofore been proposed.
(4) A method wherein a multiple pyramid (from a coarsely mosaicked image to a finely mosaicked image) is utilized. This method is described by Shin Kosugi (NTT Human Interface Laboratory) in "Search and Recognition of A Face Image in A Scene," Computer Vision, 76-7, 1992.1.23, 49-56.
(5), (6) Methods capable of coping with a change in the angle of a face. Such methods are described by Kohonen, T., Lehtio, P., Oja, E., Kortekangas, A., & Makisara, K. in "Demonstration of Pattern Processing Properties of the Optimal Associative Mappings," Proceedings of the International Conference on Cybernetics and Society, Washington, D.C., Sep. 19-21, 1977, 581-585. (b); and by J. Buhmann, J. Lange, & C. von der Malsburg in "Distortion Invariant Object Recognition by Matching Hierarchically Labeled Graphs," IJCNN 1989, Vol. 1, June 1989, 155-159.
As described above, human beings extract an appropriate size of a target of recognition from an image of the outer world and thereafter efficiently carry out recognition processing. On the other hand, with the conventional methods, an attempt is made to recognize a target in an image of the outer world only with a single processing system. Therefore, problems occur in that very complicated procedures and a very long time are required. Also, problems occur in that it is necessary for a human being to intervene in the extraction of the target, or it is necessary for the background of the image to be simple. Accordingly, the conventional methods are not satisfactory in practice. These problems occur because no efficient method has heretofore been available for extracting a target object, which is to be recognized, from an image of the outer world, and the structure of the recognition system is such that a heavy burden is imposed on a judgment means of the recognition system.
Also, in cases where the technique is used which is unaffected by a shift in position and which accurately recognizes an object image, appropriate self-organizing learning operations must be carried out on the neural network, such as the neocognitron, and a neural network suitable for the recognition of the object image must thereby be built up.
However, if substantially identical object images differ in size from one another or include an object image, for which the learning operations of the neural network need not be carried out, a disturbance will be caused in the classification into categories during the learning operations, i.e., during the creation of synaptic connections in the neural network. As a result, appropriate learning operations cannot be carried out. Therefore, when the learning operations of the neural network, such as the neocognitron, are carried out, it is necessary for a human being to intervene in order to extract a target object image, for which the learning operations are to be carried out, to normalize the extracted target object image into an appropriate size, and to feed only the necessary information to the neural network. Considerable time and labor are required to carry out such intervening operations.
Further, the aforesaid methods (1), (2), and (3) for carrying out search and recognition of an object were designed without the conditions of the rotation of a face, a change in the angle of the face, effects of a background, and the like, being contemplated in advance. Therefore, the aforesaid methods (1), (2), and (3) cannot sufficiently cope with such conditions. The aforesaid method (4) was designed by considering the effects of a background, which were not contemplated in the aforesaid methods (1), (2), and (3). However, only the front-directed face images are used in the aforesaid method (4). Therefore, the aforesaid method (4) cannot cope with rotation of a face and a change in the angle of a face. Further, the aforesaid method (4) cannot sufficiently cope with effects of a background. The aforesaid methods (5) and (6) can cope with a change in the angle of a face. However, the aforesaid methods (5) and (6) are designed on the assumption that no background is embedded in the image. Therefore, the aforesaid methods (5) and (6) cannot cope with effects of a background.