1. Field of the Invention
The present invention relates to a pattern recognition device suitable for image recognition or the like.
2. Description of the Related Art
FIG. 8 is a block diagram showing the construction of one example of prior art image recognition devices. A luminance level I (x, y) on the xy plane as an image data, for example, a person's face image photographed by a video camera (not shown) or the like is inputted in a pre-processing unit 21. In the pre-processing unit 21, the characteristic amount of the image data I (x, y), for example, an image edge P (x, y) is detected, and is outputted to an analysis unit 22.
The analysis unit 22 performs a main component analysis or the like for the characteristic amount P (x, y) of the person's image outputted from the pre-processing unit 21. It calculates a contribution degree X.sub.i of the characteristic amount P (x, y) of the person's image, for example, to each of functions F.sub.i (x, y) (i=1, 2, . . . , r) of r pieces previously stored in a function storing unit 23, and outputs it to a pattern classifying unit 24.
The pattern classifying unit 24, when the device is in a learning mode, stores the contribution degree X.sub.i of the characteristic amount P (x, y) of the person's image outputted by the analysis unit 22 in a memory (not shown) contained therein, in correspondence to the person information K (t) being the function of, for example, the number t given to the person (t=1, 2, . . . , T: T is the number of the person's faces) as the recognition result. In this case, for example, an average value of a plurality of contribution degrees X.sub.i, X.sub.i ', X.sub.i ", X.sub.i "', . . . for the image of the same person t is taken as the person information K (t).
The pattern classifying unit 24, when the device is in a recognition mode, calculates the Euclidean distance between the contribution degree X.sub.i of the characteristic amount P (x, y) of the person's image outputted from the analysis unit 22, and a known person's information K (t) previously stored in the memory contained therein. It outputs the number t in the person's information K (t) of minimizing the distances as the recognition result.
The recognition of the person's face image is thus performed.
As the method of recognizing a person's face, there has been known a technique using an image compression method called Model-Based Coding ["Treatment of Luminance/Chrominance and Motion Information Applied to 3-D Model-based Coding of Moving Facial Images": Journal of Institute of Television, Vol. 45, No. 10. p1277-1287 (1991)]. Further, related techniques have been disclosed in the following documents: ["Eigenfaces for Recognition": Journal of Cognitive Neuroscience Vol. 3, No. 1 P.71-86 (1991)] [CARICATURE GENERATOR: THE DYNAMICS EXAGGERATION OF FACES BY COMPUTER. Susan E. Brennan in Leonardo, Vol. 18, No. 3, pages 170-178; 1985], and [FACE TO FACE: ITS THE EXPRESSION THAT BEARS THE MESSAGE. Jeanne McDermott in Smithsonian, Vol. 16, No. 12, pages 112-123; March, 1986]. In the Model-Based Coding, on the coding side, as shown in FIG. 9, the so-called wire frame model is made to correspond to the person's face inputted, and the difference information (characteristics of the person's face to the model) is taken out and transmitted. On the other hand, on the decoding side, the same model as used on the coding side is deformed on the basis of the above difference information, to reproduce the person's face.
Accordingly, in recognition of the person's face using the Model-Based-Coding, the difference information between the inputted image of the person's face (FIG. 10a) and the model (FIG. 10b) is first taken.
Namely, the person's face image (FIG. 10a) photographed by a video camera is inputted, for example, in a computer and is displayed on a CRT. Then, the positions of the person's face image displayed on the CRT (indicated at X marks in FIG. 10c) in correspondence to specified positions previously set on the wire frame model (FIG. 10b), for example, eyes, both ends of a mouth and the like (indicated at X-marks in FIG. 10b) are designated, for example, by positioning a mouse controlled cursor and "clicking" with the mouse. The wire frame model is deformed as shown in FIG. 10d such that the positions (FIG. 10c) designated on the the person's face image are overlapped on the specified positions (FIG. 10b) previously set on the wire frame model. Thus, the deformed amount is taken out as the difference information.
This difference information thus taken out is made to correspond to the person's information, which is stored in a memory contained in the computer as the recognition information for that person, i.e. as the identity information.
In recognizing a person's face, the recognition information most analogous to the difference information obtained from the inputted image of the person's face is detected, and the personal identity information in correspondence to the recognition information is outputted as the recognition result.
However, in the image recognition described above, since the person's face is photographed by a video camera, there is a tendency that a vertical or horizontal deviation and a tilting are generated on the screen, and further, the magnitudes thereof are different from each other.
Accordingly, in this case, for example, in the analysis unit 22 of FIG. 8, not only the information on the person's face image, but also the information on the vertical or horizontal deviation and the positional deviation due to rotation with respect to the person's face image on the screen, and further the deviation in magnitude due to the enlargement/reduction ratio of a video camera, that is, the unnecessary information is subjected to the main component analysis. This brings about such a disadvantage as to deteriorate the recognition ratio.
Further, the model as shown in FIG. 10b must be prepared for each recognition object. Namely, for recognition of the person's face, the person's face model must be prepared, and for recognition of the person's hand, the person's hand model must be prepared. Additionally, for example, in the case that all the models are prepared and stored, a lot of memories must be prepared, thus causing a disadvantage of enlarging the size of the device.
On the other hand, in recognition of the person's face using the Model-Based Coding described above, the positions of the person's face image displayed on the CRT (indicated at X-marks in FIG. 10c) must be manually selected with a mouse, which brings about an inconvenience.