The present invention relates to a multiplexing optical system and a feature vector transformation apparatus using the same, and also relates to a feature vector detecting and transmitting apparatus and a recognition and classification system using these apparatuses. More particularly, the present invention relates to a system for transforming input information into a plurality of feature vectors at high speed and with high accuracy to perform recognition and classification.
Conventionally, recognition and classification of various kinds of information, e.g. images and signals, are performed by computing a degree of similarity between a particular image or signal, which is regarded as a vector quantity, and another vector quantity used as a criterion for comparison.
As a device for the similarity computation, a combination of a matched filter and a correlator or a joint transform correlator has heretofore been used. The conventional method using such a similarity computing device provides an adequate performance to recognize and classify known and simple information isolated from a background. However, to directly handle an image or signal having complicated features, the conventional method needs to simultaneously process not only useful information but also information that does not much contribute to recognition and classification, and such useless information becomes an error factor. The above-mentioned device for similarity computation responds undesirably sensitively to a slight deformation, rotation, scaling, etc., and is therefore likely to cause an error.
To solve these problems, many attempts have recently been made in which various kinds of information, e.g. images and signals, are not directly processed, but each object is transformed into features contributing greatly to recognition and classification by preprocessing, and recognition is made by using the transformed features on a neural network or the like. Taking the case of images, for example, typical features of images include textures, structural features, colors, temporal features, etc. Regarding textures, feature quantities are obtained by computing a gray-level histogram, a co-occurrence matrix, a difference statistical quantity, etc. Regarding structural features, e.g. edges, lines, and contours, feature quantities are obtained by convolution of Laplacian filter or Hough transform. Regarding colors, feature quantities are obtained by transformation into RGB space or HSV space or into a spectrum. Regarding temporal features, feature quantities are obtained by computation of an optical flow or by transformation into wavelets. Transformation into feature quantities by these preprocessing operations takes a great deal of time particularly when a two-dimensional image is handled as an object vector. In such a case, serial computations on an ordinary computer are impracticable. Therefore, optical methods or methods capable of parallel processing using a parallel computer, for example, have heretofore been used. To improve accuracy and capacity in particular, it is conventional practice to use not only one kind of feature quantity but a combination of a plurality of feature quantities to effect recognition and classification.
Examples of conventional optical methods among those described above include "Visual Recognition System by Microlens Array" proposed by Shunichi Kamemaru in Image Information (I), pp. 65-70 (the January 1993 issue). The proposed method uses an apparatus as shown in FIG. 27. More specifically, an object to be recognized is divided into partial elements before being multiplexed, and values of correlation between the partial elements and a plurality of reference elements are simultaneously computed by a multiple correlator consisting essentially of a plurality of conventional matched filters and a plurality of conventional correlators, which are arranged in parallel. The results of the computation are inputted to an input layer of a back propagation type neural network prepared in the computer, thereby effecting recognition. It is reported that it was possible with this method to recognize four characters, i.e. D, K, O, and X, and one space.
A similar method is disclosed in Japanese Patent Application Unexamined Publication (KOKAI) Number (hereinafter referred to as "JP(A)") 4-355732. This method uses an apparatus as shown in FIG. 28 to multiplex an object to be recognized. Values of correlation between the multiplexed object and a plurality of reference objects are simultaneously computed by a multiple correlator consisting essentially of a plurality of conventional joint transform correlators arranged in parallel. The results of the computation are further divided into some regions to improve accuracy and inputted to an input layer of a back propagation type neural network prepared in the computer to effect recognition (FIG. 28 shows only the optical system of the apparatus used in the method). In an embodiment of the conventional method, an example of recognition of object patterns (objects to be recognized) of H and E and reference patterns (reference objects) of I and V are described.
In the above-described two prior arts, transformation into a plurality of feature quantities is performed in parallel and simultaneously. Therefore, the processing time is markedly shortened. Thus, a remarkably high processing speed is attained. Other optical systems capable of transformation into a plurality of feature quantities in parallel and simultaneously are a holographic memory optical system proposed by B. Hill (B. Hill, "Some Aspects of a Large Capacity Holographic Memory", APPLIED OPTICS, Vol. 11, No. 1 (1972), pp. 182-191) and optical systems disclosed in JP(A) 1-227123, 3-148623, and 3-144814. These optical systems are capable of performing a plurality of feature transforms (more specifically, spatial frequency filtering operations or the like) in parallel and simultaneously.
The above-described method proposed by Kamemaru and the method disclosed in JP(A) 4-355732 provide optical systems capable of performing a plurality of correlation processing operations in parallel and simultaneously at high speed. The methods disclosed in JP(A) 1-227123, 3-148623, and 3-144814 provide optical systems capable of performing a plurality of spatial frequency filtering operations in parallel and simultaneously at high speed. These methods are capable of satisfactorily transforming simple images such as characters isolated from a background. However, to transform an ordinary complicated image of large capacity into feature quantities, filtering must be performed not only at high speed but also with high accuracy in practice. The optical systems used in these methods are inadequate for such transformation.
Moreover, the method proposed by Kamemaru and the method disclosed in JP(A) 4-355732 are capable of high-speed recognition to a certain extent because a plurality of feature quantities are taken out simultaneously and in parallel and these feature quantities are inputted to a neural network to perform recognition. However, these methods do not take into consideration the following matters.
Let us consider a case where, when an ordinary complicated image of large capacity (e.g. an image having a large number of pixels (vectors) to be handled) is inputted as a vector, recognition and classification are made not for the whole input image but for each small region consisting of one or a plurality of components of the vector. That is, let us suppose recognition, classification, etc. of affected parts in medical images or defective parts in FA (Factory Automation) images, for example. Affected parts and defective parts, which are objects to be recognized, in these images may appear in an infinite variety of forms as a whole; they may be deformed or vary in shape or size. For such affected parts or defective parts, it is demanded to make a judgment with respect to each of smaller regions defined as units such that "this region belongs to such and such a category (e.g. affected part), and the overall size of regions belonging to such a category is so and so" (so to speak, metrical recognition and classification are also demanded). In this case, it is necessary to effect highly accurate transformation into feature quantities and extraction at unit levels for the recognition and classification, that is, at the levels of small regions each consisting of one or a plurality of components of the input vector.
In this regard, the method proposed by Kamemaru and the method disclosed in JP(A) 4-355732, mentioned above as prior arts, perform feature transformation in the feature transform part on the basis of the correlation between the object to be recognized and an actual character (or partial elements thereof) in the feature transform section. In these methods, therefore, overall features are only roughly captured (at the character level). Moreover, the above-described intrinsic disadvantage of correlators that they respond undesirably sensitively to a slight deformation, rotation, scaling, etc. and are therefore likely to cause an error still remains, and it is therefore difficult to attain the above-described task. If these prior arts are improved to attain the above-described task by feature extraction using correlators, it is necessary to prepare a large number of extremely small reference vectors at the level of small regions each consisting of one or a plurality of components of the input vector. This cannot be said to be practical.