In the methods for recognizing an object in a three-dimensional space known before the present invention, feature patterns representing the attributes of the object by using chromatic signals or luminance ones are extracted from several two-dimensional images. The object having such attributes is recognized by using the feature patterns thus extracted.
Specifically, an image region including the object is cut out from the two-dimensional image, and the image region is divided into several rectangular elements. Then, the average values of the pixels which are included in the rectangular elements are used as the feature patterns identifying the object. Thus, since the dimension of the feature patterns becomes very high, it has been required to reduce the dimension of the feature patterns.
In order to efficiently represent such high-dimensional feature patterns in a low-dimensional feature pattern space, various orthogonaI transformation techniques such as principal component analysis and Karhunen-Loeve transform have been employed. The previous method (reported in Face Recognition: Hybrid Neural, Network Approach Technical Reports, CS-TR-3608, University of Maryland, 1996) proposed a neural network, in which feature pattern are extracted by principal component analysis. Such a method provides us a base suitable for representing higher-dimensional feature patterns in a lower-dimensional space.
In order to identify the type of an arbitrary object by using the feature patterns represented in such a lower-dimensional space, a neural network has been used. In the neural network, neuron units recognize the feature patterns by using decision function based on weighted summation calculation, convolution calculation or distance calculation.
A multi-layered neural network consisting of neural recognition units performing weighted summation calculations includes: terminals provided for an input or an output layer to which the feature patterns are input or output and an intermediate layer. The dimension of terminals in the input layer is the same as that of a feature pattern input thereto, and the number of the neuron units in the output layer is the same as the number of objects having different attributes. When a feature pattern is input to input terminals, neuron units in the output layer output the likelihood of the feature pattern belonging to each class. The type of the object to which the feature pattern belongs is determined by a neuron unit in the output layer which has the maximum likelihood output. The connection coefficients between these layers are modified by a learning technique such that a neuron unit to classify a feature pattern outputs the maximum likelihood output. The learning coefficients of the network are modified supervisedly by the error back propagation method in which a set of input patterns and expected output patterns are presented to input terminals and output terminals of neural network as teaching signals. For the neural unit having a sigmoid function, teaching signal 0 or 1 is presented to each neuron output.
Kohonen proposed a single layered network based on distance calculations, which stores reference feature patterns representing the attribute of an object. The network determines Euclidien distance between an arbitrary feature pattern and a reference feature pattern. T. ( Kohonen, The self-organizing map, Proc. of the IEEE 78, 1464-1480, 1990.) To improve the performance further, Mahalanobis distance considering statistical distribution of input feature patterns is introduced as a metric measure to determine the likelihood of a feature pattern belonging to an object class.
In an image recognition method utilizing an image communication technique, an image, which has been compressed by using MPEG or JPEG standard is transmitted, received and then decompressed. After the image has been decompressed, a feature extraction is performed, thereby determining the features of the image.
The problems of the previous methods for recognizing an object in a three-dimensional space by using the image of the object will be described.
First, in the image of an object in a three-dimensional space, the object has a high degree of freedom. Specifically, the recognition precision is dependent on the rotation and the position of the object and a variation in illumination surrounding the object.
In addition, if the three-dimensional object is to be represented by using several two-dimensional images, at least a memory having a large capacity is required. Moreover, in the principal component analysis and the orthogonal transformation which are used for reducing the amount of information, the base representing the features of an object to be recognized is dependent on the distribution of data. Furthermore, it is required to perform an enormous amount of calculation to obtain the data-dependent base. Thus, in a conventional multi-layered neural network, the network structure thereof is too complicated and the time required for learning is too long. For example, in the method by Lawrence et al., it takes a learning time of about 4 hours to make the network learn half of the data stored in the database in which 10 distinct facial features are stored for each of 40 persons. Thus, the problem is to make a complicated network efficiently classify an enormous amount of data by using a smaller number of learning image samples.
Furthermore, the statistical distribution of feature patterns of an object deviates significantly from a normal distribution when available training samples of an image database are relatively small. It is particularly difficult to estimate variance of the statistical distribution when the number of training samples is small. The error rate of an object recognition considering the variances sometimes is higher than that of an object recognition neglecting it due to failure to estimate variances precisely.