1. Field of the Invention
The present invention relates to artificial intelligence systems for image recognition. More specifically, the present invention is an image information processing system in which, based on a hypothesis regarding the basic scheme of visual information processing which approximates an actual biological vision system, various image information input through a camera or similar optical device is quantified, and subjected to calculational processing in a computer for enabling recognition of objects in the image. The system further employs automatic self-learning of object models for efficiently recognizing images.
2. Description of the Related Art
As the physiological functions of biological image recognition become further elucidated in the prior art, models have been constructed using computers and the like for approximating such recognition functions, and efforts have come about to construct artificial intelligence based visual learning systems. In such visual learning systems, a visual scene which is input via a video camera, for example, is expressed as a numerical value, and based on analysis thereof, an image object from within the visual scene is recognized, specified and/or classified. More specifically, in such systems, analytical processing is undertaken for recognizing correspondence between an input image pattern and an image pattern of a recognized object accumulated through learning.
The input scene is converted into a numerical value, such as a voltage value, which corresponds to the intensity of pixels making up the image, and expressed as a vector. For example, in the case of an image size occupying a 27.times.27 pixel region, a 729 dimensional vector is expressed in the form of a variance (dispersion) in an orthogonal axial space. As such, analytical processing of such a large amount of data, and distinguishing a target image pattern therefrom, is nearly impossible even with the capabilities of present computers.
Accordingly, to facilitate this analysis, there exists a demand for processing to be performed for converting the input image pattern which is the object of recognition into compressed data expressing the characteristics thereof, whereby comparison with accumulated learned patterns can then be relatively easily undertaken.
For efficiently analyzing the data, it is desirable to compartmentalize the data space into a so-called subspace, so as to limit the data space to its most characteristic regions.
A known method for satisfying this demand is principal component analysis (PCA). According to this method, the distribution of the object image data of a multidimensional image space is converted into feature space, and the principal components of eigenvectors which serve to characterize such space are used. More specifically, the eigenvectors are caused respectively by the amount of change in pixel intensity corresponding to changes within the image group, and can thus be thought of as characteristic axes for explaining the image.
The respective vectors corresponding to the object image include those which contribute greatly to the eigenvectors as well as those which do not contribute so much. The object image is caused by a large change of the image group, and can be closely expressed, for example, based on the collection of principal components of eigenvectors having large eigenvalues.
Stated in different terms, in order to very accurately reproduce a target image, a large number of eigenvectors are required. However, if one only desires to express the characteristics of the outward appearance of an object image, it can be sufficiently expressed using a smaller number of eigenvectors. A system utilizing the above-described eigenspace method, for recognizing human faces, is disclosed by U.S. Pat. No. 5,164,992, the disclosure of which is explicitly incorporated into the present specification by reference. This technology shall now be summarized below.
First, the facial images of a plurality of previously known persons are learned. Letting N be the number of pixels making up the facial image, M facial images are then expressed by respective vectors .GAMMA..sub.1, .GAMMA..sub.2, .GAMMA..sub.3 . . . .GAMMA..sub.n each of length N.sup.2.
Taking the difference between the vector of each person's face and the average value (.PHI..sub.i =.GAMMA..sub.i -average vector), this results in M vector groups. If a vector group A is defined by A=(.PHI..sub.i . . . .PHI..sub.M), by calculating a vector .upsilon..sub.k and a scalar quantity .lambda..sub.k as eigenvectors and eigenvalues, respectively, of the covariant matrix C=AA.sup.T of A, an eigenspace of the face is determined.
In the case of an image made up of N.times.N pixels, the matrix C has N.sup.2 eigenvectors and eigenvalues. However, when the facial data amount M is less than the N.sup.2 dimensions of the overall image space (i.e. M&lt;&lt;N.sup.2), which includes not only facial data but background data as well, in order to recognize the facial image it is acceptable to calculate only the eigenvectors of an M.times.M dimensional matrix A.sup.T A. The vector space u.sub.t =A.upsilon..sub.i can be determined from the M eigenvectors .upsilon..sub.i of the matrix L.
Hence, the data according to the above analysis is compressed whereby the number of required calculations is reduced considerably.
The input facial image is converted into the components of facial space (i.e. projected into an eigenspace of the face) through a simple operation, as follows, EQU .omega..sub.k =u.sub.k.sup.T (.GAMMA.-.PSI.) EQU .PSI.: Average Vector
which is conducted in an image processing apparatus.
Secondly, a vector .OMEGA..sup.T =(.omega..sub.1, .omega..sub.2 . . . .omega..sub.M) expresses as a weighting the degree at which each facial eigenspace contributes to the input image pattern. The vector .OMEGA. is utilized as a standard pattern recognition.
The Euclidean distance .xi. between the input image .PHI.=.GAMMA.-.PSI. and the facial eigenspace .PHI..sub.f defined by equation (1) is determined from equation (2), both equations being shown below. If .xi. is within a given threshold value, the input image is recognized as belonging to .PHI..sub.f. ##EQU1##
Stated otherwise, from within an overall image scene, by determining a vector which best evaluates the distribution of the facial images therein, the data can be limited to the partial space of the facial image. Accordingly, the amount of data is considerably reduced, and one is able to focus on a single set of data which is limited to that making up the facial characteristics.
Once the evaluation vector has been determined, the input images can be classified as having faces therein or not, and if it is judged that a face is present, a particular individual's face can be recognized by comparison with the accumulated data of facial patterns from previously known individuals. Turk et al., the inventors in the above-identified U.S. Pat. No., performed principal component analysis on learned images of 128 facial images, and in a facial recognition test undertaken in actual practice using 20 essential eigenvectors, the inventors were able to achieve a 95% rate of recognition with respect to 200 facial images.
The eigenspace method of image recognition is more effective than standard recognition techniques using template matching or standardized correlation relationships. However, in the case of images expressed by high multidimensional vectors, the parts of image features which are not explained well must be surmised, and if there are no inferential techniques for omitting the image processing calculations, it then becomes necessary to perform expanded calculations concerning all vectors, which is impossible in actual practice.
Additionally, the structural descriptions of knowledge concerning image information using only the eigenspace method are complex, and it is problematic when adapted to understanding of images in general. When applied to recognition of images which exist in reality, methods have to be established for correcting the mistaken processing results which invariably occur. Accordingly, new systems logic is indispensable for expanding the applicability of the eigenspace method to various kinds of image recognition.