1. Field of the Invention
This invention relates to a method for recognizing the presence or absence of a predetermined object image in an image. This invention particularly relates to a method for recognizing an object image wherein, during image information processing, a judgment is made as to whether a candidate for a predetermined object image, which candidate has been extracted from an image, is or is not the predetermined object image. This invention also relates to a learning method for a neural network, wherein a target object image, for which the learning operations are to be carried out, is extracted from an image, and the learning operations of a neural network for carrying out recognition of a predetermined object image are carried out with respect to the extracted target object image. This invention further relates to a method for discriminating an image wherein, during image information processing, a judgment as to whether a given image is or is not a predetermined image is made accurately without being adversely affected by a change in the angle of the image, rotation of the image and a background of the image.
2. Description of the Prior Art
A human being views an image and recognizes what the thing embedded in the image is. It is known that this action can be divided into two steps. A first step is to carry out xe2x80x9cdiscovery and extractionxe2x80x9d by moving the viewpoint, setting a target of recognition at the center point of the visual field, and at the same time finding the size of the object. A second step is to make a judgment from a memory and a knowledge of the human being as to what the object present at the viewpoint is. Ordinarily, human beings iterate the two steps and thereby acquire information about the outer world.
On the other hand, in conventional techniques for recognizing a pattern by carrying out image processing, typically in pattern matching techniques, importance is attached only to the second step. Therefore, various limitations are imposed on the first step for xe2x80x9cdiscovery and extraction.xe2x80x9d For example, it is necessary for a human being to intervene in order to cut out a target and normalize the size of the target. Also, as in the cases of automatic reading machines for postal code numbers, it is necessary for a target object to be placed at a predetermined position. As pattern recognizing techniques unaffected by a change in size and position of a target, various techniques have been proposed wherein a judgment is made from an invariable quantity. For example, a method utilizing a central moment, a method utilizing a Fourier description element, and a method utilizing a mean square error have been proposed. With such methods, for the purposes of recognition, it is necessary to carry out complicated integrating operations or coordinate transformation. Therefore, extremely large amounts of calculations are necessary in cases where it is unknown where a target object is located or in cases where a large image is processed. Also, with these methods, in cases where a plurality of object images are embedded in an image, there is the risk that their coexistence causes a noise to occur and causes errors to occur in recognizing the object images. Thus these methods are not satisfactory in practice.
As a model, which is unaffected by the size of a target object or by a shift in position of a target object and which can accurately recognize the target object, a model utilizing a neocognitron, which is one of techniques for neural networks, has been proposed. The neocognitron is described by Fukushima in xe2x80x9cNeocognitron: A Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position,xe2x80x9d Collected Papers of The Institute of Electronics and Communication Engineers of Japan, A, J62-A(10), pp. 658-665, October 1979. Neural networks constitute one of research techniques for neural information processing, which is referred to as the constructive method and which aims at clarifying the information processing principle of a brain by constructing an appropriate neural circuitry model with full consideration given to the facts known physiologically and results of research, investigating the actions and performance of the model, and comparing the actions and performance of the model with those of the actual human brain. Research has been conducted to develop various models, such as visual models, learning models, and associative memory models. In particular, the neocognitron model is tolerant of a shift in position of an object image embedded in an image. The neocognitron carries out pattern matching and self-organizing learning operations on a small part of a target object image, assimilates a shift in position at several stages with a layered architecture, and thereby tolerates the shift in position.
In the neocognitron, the operation for tolerating a shift in position of a feature little by little at several stages plays an important role in eliminating adverse effects of a shift in position of an input pattern and carrying out pattern recognition tolerant of a deformation of the input pattern. Specifically, adverse effects of shifts in position between local features of an input pattern, which shifts are due to various deformations, such as enlargement and reduction, of the input pattern, are assimilated little by little during the process for putting the features together. Ultimately, an output can be obtained which is free of adverse effects of comparatively large deformation of the input pattern.
As described above, the neocognitron is based on the principle that the pattern matching is carried out on a small part of a target object, and a shift in its position is assimilated at several stages through a layered architecture. However, with such a principle, a limitation is naturally imposed on achievement of both the accurate recognition and the assimilation of the shift in position. It has been reported, for example, by Nagano in xe2x80x9cNeural Net for Extracting Size Invariant Features,xe2x80x9d Computrol, No. 29, pp. 26-31, that the neocognitron can ordinarily tolerate only approximately four times of fluctuation in size. As for the shift in position, the neocognitron can tolerate only approximately two or three times the size of a target object. The tolerance capacity remains the same also in a recently proposed neocognitron model which is provided with a selective attention mechanism.
How the visual function of a human being carries out the first step has not yet been clarified. On the other hand, how the viewpoint moves has been clarified to some extent as described, for example, by Okewatari in xe2x80x9cVisual and Auditory Information Processing in Living Body System,xe2x80x9d Information Processing, Vol. 23, No. 5, pp. 451-459, 1982, or by Sotoyama in xe2x80x9cStructure and Function of Visual Systemxe2x80x9d, Information Processing, Vol. 26, No. 2, pp. 108-116, 1985. It is known that eyeball movements include a saccadic movement, a follow-up movement, and involuntary movement. Several models that simulate these eye movements have been proposed. For example, a model in which the viewpoint is moved to the side of a larger differential value of an image is proposed, for example, by Nakano in xe2x80x9cPattern Recognition Learning System,xe2x80x9d Image Information (I), 1987/1, pp. 31-37, or by Shiratori, et al. in xe2x80x9cSimulation of Saccadic Movement by Pseudo-Retina Mask,xe2x80x9d ITEJ Tec. Rep. (Technical Report of The Institute of Television Engineers of Japan), Vol. 14, No. 36, pp. 25-30, ICSxe2x80x2 90-54, AIPSxe2x80x2 90-46, June 1990. Also, a model in which the viewpoint is moved to the side of a higher lightness is proposed, for example, by Hirahara, et al. in xe2x80x9cNeural Net for Specifying a Viewpoint,xe2x80x9d ITEJ Tec. Rep., Vol. 14, No. 33, pp. 25-30, VAIxe2x80x2 90-28, June 1990. Additionally, a model in which the viewpoint is moved to a point of a contour having a large curvature is proposed, for example, by Inui, et al. in Japanese Unexamined Patent Publication No. 2(1990)-138677. However, these proposed models are rather simple and do not well simulate the human visual function.
Also, for the purposes of finding a target of recognition and extracting a region including the whole target, instead of adhering only to local features of the target object, it is necessary that the movement of the viewpoint becomes stable (stationary) at the center point of the whole target. However, with the aforesaid conventional models, such an operation for stabilizing the viewpoint cannot be carried out. For example, with the model proposed by Shiratori, et al. wherein the pseudo-retina mask is utilized, the viewpoint moves forward and backward around the contour line of an object and does not become stable. Also, with the model proposed by Inui, et al., the viewpoint can ultimately catch only a feature point at a certain limited part of an object. Additionally, most of the aforesaid conventional models requires, as a tacit precondition, that the background of an object is simple. Thus most of the aforesaid conventional models cannot be applied to natural images, such as ordinary photographic images.
As described above, various techniques have been proposed which enables satisfactory recognition of a target in cases where a human being intervenes in order to assimilate a shift in position of the target or a change in the size of the target or in cases where the position and the size of the target are normalized in advance. However, no excellent technique has yet been proposed, with which the whole target object image can be extracted from an image for the purposes of recognizing the object image.
Further, research to develop models for carrying out search and recognition of objects has heretofore been considered as one of important techniques for image information processing and has been carried out in various manners. In particular, attempts have heretofore been made extensively to recognize face patterns, which serves as patterns of objects and are embedded in images.
Typical models utilizing faces as target objects include the following methods:
(1) A method wherein an eigenface obtained by analyzing the main components of a sample of a face image is utilized. The method is described by Matthew, T., Alex, P. in xe2x80x9cEigenfaces for Recognition,xe2x80x9d Journal of Cognitive Neuroscience, Vol. 3, No. 1, 1991, 71-86.
(2) A method wherein a square region, which has been cut out of a face image, is mosaicked, and thereafter a learning operation on the face image is carried out with a BP method, which is one of neural network techniques, the face image being thereby recognized. This method is described by Shin Kosugi (NTT Human Interface Laboratory) in xe2x80x9cA Study of Face Image Recognition Using A Neural Network,xe2x80x9d ITEJ Tec. Rep., Vol. 14, No. 50, 1990.9, 7-12.
(3) A method wherein color information and KL development are utilized. This method is described by Tsutomu Sasaki (NTT Human Interface Laboratory), Shigeru Akamatsu, et al., in xe2x80x9cStudy of An Automatic Recognition Method for A Frontal Face Image,xe2x80x9d Shingiho, IE91-50, 1-8.
Also, the methods described below have heretofore been proposed.
(4) A method wherein a multiple pyramid (from a coarsely mosaicked image to a finely mosaicked image) is utilized. This method is described by Shin Kosugi (NTT Human Interface Laboratory) in xe2x80x9cSearch and Recognition of A Face Image in A Scene,xe2x80x9d Computer Vision, 76-7, 1992.1.23, 49-56.
(5), (6) Methods capable of coping with a change in the angle of a face. Such methods are described by Kohonen, T., Lehtio, P., Oja, E., Kortekangas, A., and Makisara, K. in xe2x80x9cDemonstration of Pattern Processing Properties of the Optimal Associative Mappings,xe2x80x9d Proceedings of the International Conference on Cybernetics and Society, Washington, D.C., Sep. 19-21, 1977, 581-585. (b); and by J. Buhmann, J. Lange, and C. von der Malsburg in xe2x80x9cDistortion Invariant Object Recognition by Matching Hierarchically Labeled Graphs,xe2x80x9d IJCNN 1989, Vol. Jun. 1, 1989, 155-159.
As described above, human beings extract an appropriate size of a target of recognition from an image of the outer world and thereafter efficiently carry out recognition processing. On the other hand, with the conventional methods, an attempt is made to recognize a target in an image of the outer world only with a single processing system. Therefore, problems occur in that very complicated procedures and a very long time are required. Also, problems occur in that it is necessary for a human being to intervene in the extraction of the target, or it is necessary for the background of the image to be simple. Accordingly, the conventional methods are not satisfactory in practice. These problems occur because no efficient method has heretofore been available for extracting a target object, which is to be recognized, from an image of the outer world, and the structure of the recognition system is such that a heavy burden is imposed on a judgment means of the recognition system.
Also, in cases where the technique is used which is unaffected by a shift in position and which accurately recognizes an object image, appropriate self-organizing learning operations must be carried out on the neural network, such as the neocognitron, and a neural network suitable for the recognition of the object image must thereby be built up.
However, if substantially identical object images differ in size from one another or include an object image, for which the learning operations of the neural network need not be carried out, a disturbance will be caused in the classification into categories during the learning operations, i.e., during the creation of synaptic connections in the neural network. As a result, appropriate learning operations cannot be carried out. Therefore, when the learning operations of the neural network, such as the neocognitron, are carried out, it is necessary for a human being to intervene in order to extract a target object image, for which the learning operations are to be carried out, to normalize the extracted target object image into an appropriate size, and to feed only the necessary information to the neural network. Considerable time and labor are required to carry out such intervening operations.
Further, the aforesaid methods (1), (2), and (3) for carrying out search and recognition of an object were designed without the conditions of the rotation of a face, a change in the angle of the face, effects of a background, and the like, being contemplated in advance. Therefore, the aforesaid methods (1), (2), and (3) cannot sufficiently cope with such conditions. The aforesaid method (4) was designed by considering the effects of a background, which were not contemplated in the aforesaid methods (1), (2), and (3). However, only the front-directed face images are used in the aforesaid method (4). Therefore, the aforesaid method (4) cannot cope with rotation of a face and a change in the angle of a face. Further, the aforesaid method (4) cannot sufficiently cope with effects of a background. The aforesaid methods (5) and (6) can cope with a change in the angle of a face. However, the aforesaid methods (5) and (6) are designed on the assumption that no background is embedded in the image. Therefore, the aforesaid methods (5) and (6) cannot cope with effects of a background.
The primary object of the present invention is to provide a method for recognizing an object image wherein, during pattern recognition, a candidate for a predetermined object image is extracted appropriately, an appropriate judgment is made as to whether the extracted candidate for the predetermined object image is or is not the predetermined object image, and the time required for operations is kept short.
Another object of the present invention is to provide a method for recognizing an object image, wherein a predetermined object image is extracted appropriately and accurately from a given image.
A further object of the present invention is to provide a method for recognizing an object image, wherein an accurate judgment is made as to whether a given candidate for a predetermined object image is or is not the predetermined object image.
A still further object of the present invention is to provide a method for recognizing an object image wherein, even if the background of a candidate for a predetermined object image in an image is complicated, the candidate for the predetermined object image is extracted appropriately.
Another object of the present invention is to provide a method for recognizing an object image, wherein the judgment performance of a system, which makes a judgment as to whether a candidate for a predetermined object image is or is not the predetermined object image, is kept high.
A further object of the present invention is to provide a method for recognizing an object image, wherein judgments as to whether feature parts of a predetermined object image are or are not included in feature parts of a candidate for the predetermined object image are made appropriately regardless of a change in the angle of the object image and a difference among object images.
A still further object of the present invention is to provide a learning method for a neural network, wherein a target object image, for which the learning operations of a neural network are to be carried out, is automatically normalized and extracted, and the learning operations of the neural network are carried out efficiently.
Another object of the present invention is to provide a learning method for a neural network, wherein a target object image is extracted automatically from an image, the extracted target object image is classified in an arranged form, and learning operations are thereby carried out.
A further object of the present invention is to provide a method for recognizing an object image and a learning method for a neural network, which enable the operation scale to be kept small.
A still further object of the present invention is to provide a method for recognizing an object image and a learning method for a neural network, in which extraction and judgment processes are carried out simultaneously and which enables processing to be carried out very quickly with special hardware functions.
Another object of the present invention is to provide a method for recognizing an object image and a learning method for a neural network, wherein a view window of an input device is caused to travel to an object image, which shows a movement different from the movement of the background of the object image.
A further object of the present invention is to provide a method for discriminating an image, wherein image discrimination is carried out accurately without being adversely affected by rotation of a predetermined image, such as a face image, a change in the angle of the image, and a background of the image.
The present invention provides a method for recognizing an object image, which comprises the steps of:
i) extracting a candidate for a predetermined object image from an image, and
ii) making a judgment as to whether the extracted candidate for the predetermined object image is or is not the predetermined object image.
In a first method for the aforesaid extraction, the extraction of the candidate for the predetermined object image is carried out by:
a) causing the center point of a view window, which has a predetermined size, to travel to the position of the candidate for the predetermined object image, and
b) determining an extraction area in accordance with the size and/or the shape of the candidate for the predetermined object image, the center point of the view window being taken as a reference during the determination of the extraction area.
In a second method for the aforesaid extraction, the extraction of the candidate for the predetermined object image is carried out by:
a) cutting out an image, which falls in the region inside of a view window having a predetermined size, from the image,
b) finding azimuths and intensities of components, such as a color and contour lines, of the candidate for the predetermined object image with respect to the center point of the view window, the azimuths and the intensities being found as azimuth vectors from a movement of the whole cut-out image or of an entire complex-log mapped image, which is obtained from transformation of the cut-out image with complex-log mapping, the color of the candidate for the predetermined object image included in the cut-out image, and/or tilts of the contour lines of the candidate for the predetermined object image included in the cut-out image,
c) composing a vector from the azimuth vectors, a vector for the travel of the view window being thereby determined,
d) causing the center point of the view window to travel in accordance with the vector for the travel of the view window, and
e) determining an extraction area in accordance with the size and/or the shape of the candidate for the predetermined object image, the center point of the view window being taken as a reference during the determination of the extraction area.
In a third method for the aforesaid extraction, the extraction of the candidate for the predetermined object image is carried out by:
a) taking the vectors for the travel of the view window, which are determined with the aforesaid second method for the extraction, as gradient vectors of a potential field, recording the gradient vectors of the potential field on the whole image, and thereby creating a map of the potential field, and
b) determining an extraction area in accordance with the size and/or the shape of the candidate for the predetermined object image, a minimum point of the potential in the map being taken as a reference during the determination of the extraction area.
What the term xe2x80x9cpotential fieldxe2x80x9d as used herein means will be described hereinbelow. When a human being views an image, he will look around the image and will move his viewpoint to a predetermined object image embedded in the image (e.g., to a face image in cases where an image constituted of a human face image and a background representing the sky is viewed). Thereafter, he will recognize that the thing present at the viewpoint is the face image. When the viewpoint is currently located at a position spaced apart from the predetermined object image, it is necessary for the viewpoint to be moved a long distance towards the predetermined object image on the image. When the viewpoint is currently located at a position near the predetermined object image, the viewpoint need to travel only a short distance in order to reach the predetermined object image. At the position of the predetermined object image, the viewpoint becomes stable. Specifically, if the direction and the amount in which the viewpoint is to travel is expressed as a vector for the travel of the viewpoint, the vector for the travel of the viewpoint will represent the direction of the viewpoint stabilizing point and the amount of travel thereto, which are taken from the current position of the viewpoint. At the viewpoint stabilizing point, i.e., at the center point of the predetermined object image, the vector for the travel of the viewpoint is zero. If it is considered that the image has a xe2x80x9cfieldxe2x80x9d of stability of the viewpoint, the xe2x80x9cfieldxe2x80x9d is flat at the viewpoint stabilizing point and is gradient at a point, at which the viewpoint is unstable and from which the viewpoint is required to travel in order to become stable. As described above, it can be regarded that the vector for the travel of the viewpoint represents the gradient of the xe2x80x9cfield.xe2x80x9d Also, it can be regarded that the travel of the viewpoint is equivalent to the movement to the side of a lower potential in the xe2x80x9cfield.xe2x80x9d The xe2x80x9cfieldxe2x80x9d of stability of the viewpoint is herein referred to as the xe2x80x9cpotential field.xe2x80x9d
As described above, the map of the potential field over the whole image is created from gradient vectors of the potential field, which are calculated at respective positions of the whole image. Specifically, the vectors for the travel of the human viewpoint are taken as the gradient vectors of the potential field, and it is regarded that the potential field is inclined to the direction, to which each gradient vector of the potential field is directed. The gradients of the field are recorded on the whole image such that the gradient vector of the potential field may be zero, i.e., the potential field may be minimum, at the center point of the candidate for the predetermined object image. From the map created in this manner, it can be understood easily which path the viewpoint at a current position on the image will follow in order to fall into the minimum point of the potential field. The extraction area is determined in accordance with the size and/or the shape of the candidate for the predetermined object image by taking the minimum point of the potential as a reference.
When the vector for the travel of the view window is composed from the azimuth vectors, if necessary, phase shifts or weights may be applied to the azimuth vectors. Also, a neural network may be employed in order to determine the vector for the travel of the view window or the gradient vector of the potential field and to extract the candidate for the predetermined object image.
The judgment as to whether the extracted candidate for the predetermined object image is or is not the predetermined object image may be made by:
a) causing a learning means to learn a plurality of feature patterns with respect to each of a plurality of feature parts of the predetermined object image,
b) making judgments as to whether feature parts of the candidate for the predetermined object image are or are not included in the plurality of the feature patterns with respect to each of the plurality of the feature parts of the predetermined object image, which feature patterns the learning means has learned, and
c) making a judgment as to whether the relationship between the positions of the feature parts of the candidate for the predetermined object image coincides or does not coincide with the relationship between the positions of the feature parts of the predetermined object image, thereby judging whether the candidate for the predetermined object image is or is not the predetermined object image.
In the method for recognizing an object image in accordance with the present invention, the extraction of the candidate for the predetermined object image and the judgment as to whether the extracted candidate for the predetermined object image is or is not the predetermined object image may be embodied as defined in claims 2 through 35 and claim 65, and reference should herein be made thereto.
The present invention also provides a learning method for a neural network, which comprises the steps of:
i) extracting a target object image, for which learning operations are to be carried out, from an image,
ii) feeding a signal, which represents the extracted target object image, into a neural network, and
iii) carrying out the learning operations of the neural network in accordance with the input target object image.
In order to extract the target object image, the same methods as those for the extraction of the candidate for the predetermined object image in the method for recognizing an object image in accordance with the present invention may be employed.
As in the aforesaid method for recognizing an object image in accordance with the present invention, the extraction of the target object image in the learning method for a neural network in accordance with the present invention may be embodied as defined in claims 36 through 64, and reference should herein be made thereto.
The present invention further provides a first method for discriminating an image, wherein a judgment is made as to whether a given image is or is not a predetermined image, the method comprising the steps of:
i) extracting a reference point, which is unaffected by a change in the angle of the given image and/or by rotation of the given image, from the given image,
ii) detecting an axis of symmetry and/or feature parts of the given image in accordance with the reference point, and
iii) making a judgment as to whether the given image is or is not a predetermined image, the judgment being made in accordance with the axis of symmetry and/or the feature parts of the given image.
The present invention still further provides a second method for discriminating an image, wherein the first method for discriminating an image in accordance with the present invention is modified such that the detection of the axis of symmetry and/or the feature parts of the given image may be carried out by developing the given image in a coordinates space in accordance with the reference point.
The present invention also provides a third method for discriminating an image, wherein the second method for discriminating an image in accordance with the present invention is modified such that the coordinates space may be a polar coordinates space having its pole at the reference point.
The present invention further provides a fourth method for discriminating an image, wherein the first, second, or third method for discriminating an image in accordance with the present invention is modified such that the predetermined image is a face image, and a judgment is made as to whether the given image is or is not a face image.
The present invention still further provides a fifth method for discriminating an image, wherein the fourth method for discriminating an image in accordance with the present invention is modified such that the method may comprise the steps of:
1) extracting a center point between candidates for eye patterns as the reference point, which is unaffected by a change in the angle of the given image and/or by rotation of the given image, from the given image,
2) detecting an axis of symmetry, which passes through the center point between the candidates for eye patterns, in accordance with the extracted center point between the candidates for eye patterns,
3) detecting the feature parts of the given image in accordance with the axis of symmetry, and
4) making a judgment as to whether the given image is or is not a face image, the judgment being made in accordance with information about the center point between the candidates for eye patterns, the axis of symmetry, and/or the feature parts of the given image.
In cases where the predetermined image is a face image, the method for discriminating an image in accordance with the present invention may be embodied as defined in claims 71 through 160, and reference should herein be made thereto.
With the method for recognizing an object image in accordance with the present invention, the candidate for the predetermined object image is extracted from an image, and thereafter a judgment is made as to whether the extracted candidate for the predetermined object image is or is not the predetermined object image. Therefore, a judgment as to whether the extracted candidate for the predetermined object image is or is not the predetermined object image can be made accurately and easily without heavy burden being imposed on a judgment means of a system, in which the method for recognizing an object image in accordance with the present invention is carried out. Also, the extraction process and the judgment process can be carried out simultaneously with each other, and very quick operations can be achieved with special hardware functions for carrying out the recognition of the object image. Additionally, processing need be carried out only for a limited part of the image, in which the candidate for the predetermined object image is embedded, and therefore the operation time can be kept short.
The extraction of the candidate for the predetermined object image may be carried out by causing the center point of the view window, which has a predetermined size, to travel to the position of the candidate for the predetermined object image, and determining the extraction area in accordance with the size and/or the shape of the candidate for the predetermined object image. During the determination of the extraction area, the center point of the view window is taken as a reference.
Alternatively, the extraction of the candidate for the predetermined object image may be carried out by cutting out an image, which falls in the region inside of the view window having a predetermined size, from the image, and detecting a contour line of the candidate for the predetermined object image from the cut-out image. Thereafter, contour line components, which are tilted at a predetermined angle with respect to circumferential directions of concentric circles surrounding the center point of the view window, are extracted from the contour line of the candidate for the predetermined object image. Azimuth vectors are detected from these contour line components. A vector is then composed from the azimuth vectors, and a vector for the travel of the view window is thereby determined. In this manner, the direction, to which the center point of the view window should travel, is determined. The extraction area is then determined in accordance with the size and/or the shape of the candidate for the predetermined object image, the center point of the view window being taken as a reference during the determination of the extraction area. In cases where the cut-out image is transformed with the complex-log mapping, the candidate for the predetermined object image can be extracted in the same manner as that when the extraction of the candidate for the predetermined object image is carried out in the Cartesian plane.
As another alternative, the extraction of the candidate for the predetermined object image may be carried out by cutting out an image, which falls in the region inside of the view window having a predetermined size, from the image, and detecting a region, which approximately coincides in color with the candidate for the predetermined object image, from the cut-out image. A vector for the travel of the view window is then determined from the azimuth and the distance of the detected region. In this manner, the direction, to which the center point of the view window should travel, is determined. The extraction area is then determined in accordance with the size and/or the shape of the candidate for the predetermined object image, the center point of the view window being taken as a reference during the determination of the extraction area.
The term xe2x80x9capproximately coinciding in color with a candidate for a predetermined object imagexe2x80x9d as used herein means that the distance on a chromaticity diagram shown in FIG. 67 between a chromaticity value of the candidate for the predetermined object image and a chromaticity value at an arbitrary point of the cut-out image, which falls in the region inside of the view window having a predetermined size, is smaller than a certain threshold value. Specifically, in cases where chromaticity values at certain points of the cut-out image are spaced a distance larger than the predetermined threshold value on the chromaticity diagram from the chromaticity value of the candidate for the predetermined object image, the region constituted of these points is not extracted. In cases where chromaticity values at certain points of the cut-out image are spaced a distance smaller than the predetermined threshold value on the chromaticity diagram from the chromaticity value of the candidate for the predetermined object image, the region constituted of these points is extracted.
With the aforesaid another alternative, in cases where the cut-out image is transformed with the complex-log mapping, the candidate for the predetermined object image can be extracted in the same manner as that when the extraction of the candidate for the predetermined object image is carried out in the Cartesian plane.
Also, in the aforesaid another alternative, after a plurality of the regions are extracted, of the extracted regions, regions, which exhibit a high degree of coincidence in color with the candidate for the predetermined object image and which are located at positions close to one another, should preferably be caused to cooperate with each other and thereby emphasized. Also, a region, which exhibits a high degree of coincidence in color with the candidate for the predetermined object image, and a region, which exhibits a low degree of coincidence in color with the candidate for the predetermined object image and which is located at a position spaced apart from the region exhibiting a high degree of coincidence in color with the candidate for the predetermined object image, should preferably be caused to compete with each other, whereby the region, which exhibits a low degree of coincidence in color with the candidate for the predetermined object image, is erased. Also, regions, which exhibit a high degree of coincidence in color with the candidate for the predetermined object image and which are located at positions spaced apart from each other, should preferably be caused to compete with each other. In this manner, a region exhibiting a high degree of coincidence in color with the candidate for the predetermined object image, which region has a size and a shape appropriate for the region to be selected, is kept unerased. Also, a region exhibiting a high degree of coincidence in color with the candidate for the predetermined object image, which region has a size and a shape inappropriate for the region to be selected, is erased. Thus a region, which is most appropriate in the region inside of the view window, is selected as a target object image region. The azimuth and the distance of the selected region in the complex-log mapped plane are detected as the vector for the travel of the view window. In such cases, the region, which exhibits a high degree of coincidence in color with the candidate for the predetermined object image, can be extracted easily.
When the candidate for the predetermined object image is extracted in the manner described above, it is possible to extract not only a candidate for a specific object image but also a candidate for a predetermined object image having any shape. Also, even if the background of the candidate for the predetermined object image in the image is complicated, the candidate for the predetermined object image can be extracted appropriately. Additionally, processing need be carried out only for a limited part of the image, in which the candidate for the predetermined object image is embedded, and therefore the operation time can be kept short.
As a further alternative, the extraction of the candidate for the predetermined object image may be carried out by cutting out a plurality of images, which fall in the region inside of the view window, at a plurality of times having a predetermined time difference therebetween, calculating the difference between contour lines of object images embedded in the plurality of the cut-out images, and detecting a movement of a background in a vertical or horizontal direction in the region inside of the view window, the movement being detected from the calculated difference. At the same time, the images, which fall in the region inside of the view window, are transformed with the complex-log mapping into complex-log mapped images. The difference between contour lines of object images, which lines extend in the radial direction, is calculated from the complex-log mapped images, and a movement of the background in an in-plane rotating direction is thereby detected. Also, the difference between contour lines of object images, which lines extend in the annular direction, is calculated from the complex-log mapped images, and a movement of the background in the radial direction is thereby detected. Thereafter, the movement of the background is compensated for in accordance with the detected movement of the background in the vertical or horizontal direction, in the in-plane rotating direction, and/or in the radial direction. A contour line of an object, which shows a movement different from the movement of the background, is detected from the image, in which the movement of the background has been compensated for. Azimuth vectors are then detected from components of the contour line, which are tilted at a predetermined angle with respect to the annular direction in the complex-log mapped plane. A vector is then composed from the azimuth vectors, and a vector for the travel of the view window is thereby determined. In this manner, the direction, to which the center point of the view window should travel, is determined. The extraction area for the extraction of the candidate for the predetermined object image is then determined in accordance with the size and/or the shape of the object, the center point of the view window being taken as a reference during the determination of the extraction area.
In the manner described above, only the candidate for the predetermined object image can be extracted in cases where the candidate for the predetermined object image is moving in the region inside of the view window and in cases where the whole image, i.e., the background, is moving. Also, it is possible to follow up a candidate for the predetermined object image, which moves every moment, to find the candidate for the predetermined object image at the center point of the view window, and thereby to extract the candidate for the predetermined object image. Additionally, even if the background of the candidate for the predetermined object image in the image is complicated, the candidate for the predetermined object image can be extracted appropriately. Further, processing need be carried out only for a limited part of the image, in which the candidate for the predetermined object image is embedded, and therefore the operation time can be kept short.
With the aforesaid further alternative, in cases where the cut-out image is transformed with the complex-log mapping, the candidate for the predetermined object image can be extracted in the same manner as that when the extraction of the candidate for the predetermined object image is carried out in the Cartesian plane.
Also, in cases where the extraction of the candidate for the predetermined object image in accordance with its contour lines and the extraction of the candidate for the predetermined object image in accordance with its color are carried out simultaneously in the manner described above, the candidate for the predetermined object image can be extracted more accurately.
The extraction of the candidate for the predetermined object image in accordance with its contour lines, the extraction of the candidate for the predetermined object image in accordance with its color, and the extraction of the candidate for the predetermined object image in accordance with the movement should preferably be carried out simultaneously. In such cases, the candidate for the predetermined object image can be extracted even more accurately.
Further, as described above, the extraction of the candidate for the predetermined object image may be carried out by creating a map of the potential field of the whole image, from which the candidate for the predetermined object image is to be extracted, and determining an extraction area in accordance with the size and/or the shape of the candidate for the predetermined object image, a minimum point of the potential in the map being taken as a reference during the determination of the extraction area. In such cases, it is possible to extract not only a candidate for a specific object image but also a candidate for a predetermined object image having any shape. Also, even if the background of the candidate for the predetermined object image in the image is complicated, the candidate for the predetermined object image can be extracted appropriately.
Specifically, the vectors for the travel of the view window, which are determined from the contour lines, the color, and/or the movement, are taken as gradient vectors of a potential field. A map of the potential field of the whole image is created from the gradient vectors of the potential field. The extraction area is then determined in accordance with the size and/or the shape of the candidate for the predetermined object image by taking a minimum point of the potential in the map as a reference. In this manner, the minimum point in the candidate for the predetermined object image, i.e., the center point of the candidate for the predetermined object image can be found from the gradients of the potential field. Therefore, the candidate for the predetermined object image can be extracted very accurately and efficiently.
Also, with this method for the extraction of the candidate for the predetermined object image, it is possible to extract not only a candidate for a specific object image but also a candidate for a predetermined object image having any shape. Also, even if the background of the candidate for the predetermined object image in the image is complicated, the candidate for the predetermined object image can be extracted appropriately.
Additionally, with this method for the extraction of the candidate for the predetermined object image, in cases where the cut-out image is transformed with the complex-log mapping, the map of the potential field can be created, and the candidate for the predetermined object image can be extracted in the same manner as that when the extraction of the candidate for the predetermined object image is carried out in the Cartesian plane.
Further, in cases where the judgment as to whether the candidate for the predetermined object image is or is not the predetermined object image is made from feature parts of the predetermined object image and the positions of the feature parts in the predetermined object image, an accurate judgment can be made as to whether the extracted candidate for the predetermined object image is or is not the predetermined object image. Therefore, the performance of the system, in which the method for recognizing an object image in accordance with the present invention is employed, can be kept high.
Specifically, with the method for recognizing an object image in accordance with the present invention, as described above, during the judgment as to whether the extracted candidate for the predetermined object image is or is not the predetermined object image, the learning means is caused to learn a plurality of feature patterns with respect to each of a plurality of feature parts of the predetermined object image. Judgments are then made as to whether feature parts of the candidate for the predetermined object image are or are not included in the plurality of the feature patterns with respect to each of the plurality of the feature parts of the predetermined object image, which feature patterns the learning means has learned. Thereafter, a judgment is made as to whether the relationship between the positions of the feature parts of the candidate for the predetermined object image coincides or does not coincide with the relationship between the positions of the feature parts of the predetermined object image. A judgment is thereby made as to whether the candidate for the predetermined object image is or is not the predetermined object image. In such cases, even if the feature parts of the candidate for the predetermined object image, on which a judgment is to be made, vary for different candidates for predetermined object images, the judgment as to whether the candidate for the predetermined object image is or is not the predetermined object image can be made accurately from the plurality of the feature patterns, which the learning means has learned.
In cases where the learning operations of the learning means are carried out with the learning method for a neural network in accordance with the present invention by utilizing a neural network, in particular, by utilizing the Kohonen""s self-organization, self-organization of a plurality of feature patterns is effected with the topological mapping, and the learning means can efficiently learn the plurality of feature patterns. Therefore, judgments as to whether feature parts of the predetermined object image are or are not included in feature parts of the candidate for the predetermined object image can be made efficiently regardless of a change in the angle of the object image and a difference among object images.
Also, in cases where a neural network, in particular, the neocognitron, is utilized in order to make judgments as to whether feature parts of the candidate for the predetermined object image are or are not included in the plurality of the feature patterns, which the learning means has learned, and/or a judgment as to whether the relationship between the positions of the feature parts of the candidate for the predetermined object image coincides or does not coincide with the relationship between the positions of the feature parts of the predetermined object image. In such cases, the results of judgment are not affected by a shift in position of the candidate for the predetermined object image, and the performance of the system, in which the method for recognizing an object image in accordance with the present invention is employed, can be kept high.
Additionally, in cases where a face image is taken as the predetermined object image, and right eye, left eye, and mouth patterns are taken as the plurality of feature parts, on which the learning operations are to be carried out, a judgment as to whether a candidate for the face image is or is not the face image can be made regardless of a change in the facial expression, a shift in position of the candidate for the face image, or the like. Further, a candidate for the face image different from the face image, which has been utilized during the learning operations, can be judged as being the face image.
With the learning method for a neural network in accordance with the present invention, a target object image, for which learning operations are to be carried out, is extracted from an image, and a signal, which represents the extracted target object image, is fed into a neural network. The learning operations of the neural network are then carried out in accordance with the input target object image. Therefore, the target object image can be extracted automatically from an image, the extracted target object image can be classified in an arranged form, and the learning operations can thereby be carried out. Accordingly, a human being need not intervene in order to extract and normalize the target of the learning operations, and the learning operations can be carried out efficiently.
The target object image, on which the learning operations are to be carried out, may be extracted in the same manner as that in the extraction of the candidate for the predetermined object image in the aforesaid method for recognizing an object image in accordance with the present invention.
Also, the center point of the view window having a predetermined size may be caused to travel to the center point of the candidate for the predetermined object image, and the size and/or the shape of the candidate for the predetermined object image may be normalized by taking the center point of a view window and a contour line of the candidate for the predetermined object image as reference. Thereafter, the normalized candidate for the predetermined object image may be extracted. In such cases, object images having different sizes and/or shapes can be extracted as those having approximately identical sizes and/or shapes. Accordingly, burden to a step, such as the judgment step or the learning step, which is carried out after the extraction of the contour line of the object image, can be kept light. Also, the judgment and the learning operations can be carried out appropriately.
With the method for discriminating an image in accordance with the present invention, the reference point, which is unaffected by a change in the angle of the given image and/or by rotation of the given image, is extracted from the given image. The axis of symmetry and/or feature parts of the given image are detected in accordance with the reference point. Thereafter, a judgment as to whether the given image is or is not a predetermined image is made in accordance with the axis of symmetry and/or the feature parts of the given image. Therefore, an accurate judgment can be made regardless of a change in the angle of the given image and rotation of the given image.
Also, in cases where the axis of symmetry and/or the feature parts of the given image are detected in accordance with the reference point of the given image, the detection of the axis of symmetry and/or the feature parts of the given image can be carried out more easily by developing the given image in a coordinates space in accordance with the reference point. A polar coordinates space having its pole at the reference point is one of the most appropriate coordinates spaces.
The method for discriminating an image in accordance with the present invention is suitable for discriminating a face image which serves as the predetermined image.
In cases where a face image is discriminated, the discrimination can be carried out accurately by extracting the center point between candidates for eye patterns, which are embedded in the given image, as the reference point, which is unaffected by a change in the angle of the given image and/or by rotation of the given image, and detecting the axis of symmetry and/or the feature parts in a polar coordinates space having its pole at the center point between the candidates for eye patterns.
The feature parts of the given image should preferably include a candidate for a face contour and a candidate for a mouth pattern region.
Also, the judgment as to whether the given image is or is not a face image may be made in accordance with a candidate for the face contour, a candidate for the mouth pattern region, and other feature parts, such as ear patterns, a nose pattern, and hair patterns.
The predetermined image may be selected from various images, such as a face image and a signpost image. Also, an asymmetric image, such as a side-directed face image, can be discriminated accurately regardless of a change in the angle of the image and rotation of the image by, for example, carrying out the polar coordinates transformation with respect to an eye pattern taken as the reference point and detecting the feature parts. The method for discriminating an image in accordance with the present invention is suitable for operations wherein a candidate for a predetermined object image is extracted with a method proposed in U.S. patent appln. Ser. No. 07/944850, and a judgment is made as to whether the candidate for the predetermined object image is or is not a face image.
The method for discriminating an image in accordance with the present invention may be combined with the technique, which is proposed in, for example, U.S. patent application Ser. No. 07/944850 and which is capable of discovering and extracting an image considered as being a predetermined image from a natural image, normalizing the image size, and thereafter presenting the normalized image. In such cases, the method for discriminating an image in accordance with the present invention can cope with a change in the image size, a change in the angle of the image, and rotation of the image.
As described above, the method for discriminating an image in accordance with the present invention can cope with rotation of the given image and a change in the angle of the given image and can eliminate adverse effects of a background even if the background is complicated. Therefore, with the method for discriminating an image in accordance with the present invention, an accurate judgment can be made as to whether the given image is or is not the predetermined image.