1. Field of the Invention
This invention relates to the automatic preparation of a recognition dictionary or a decision tree, which is inevitable for classification or search in the process of image recognition, and more specifically to a system for preparing a recognition dictionary, which is suitable for simplifying the operation of information recognition systems (e.g. image recognition systems) and for realizing high speed on-line information recognition, and a method therefor.
2. DESCRIPTION OF THE PRIOR ART
In general, in image recognition, objects to be recognized are identified on the basis of a recognition dictionary previously prepared. The recognition dictionary is prepared by analyzing the distribution data on various feature axes after having extracted several times various features for each of the objects to be recognized (hereinbelow called categories).
At this time, in order to simplify the analysis and to exclude the subject of the analyser, it is desirable to prepare automatically a recognition dictionary by a standardized method. As the structure of recognition dictionaries, decision tree structures, by which an importance is attached to the processing speed of recognition, are most widely utilized.
As prior art concerning the classification by use of a tree structure there is known "SRI VISION RESEARCH FOR ADVANCED INDUSTRIAL AUTOMATION" by Gerald J. AGIN and Richard 0. DUDA, Second USA-JAPAN Computer Conference, 1975.
However, it was not possible to obtain any satisfactory recognition speed by the methods for preparing decision tree dictionaries, which have been heretofore proposed. For example, there is known a method by which a plurality of categories is divided into 2 classes on a feature axis, where the separability (variance of adjacent two categories) is the largest and similar classifications are repeated for each of new classes consisting of a plurality of categories after a preceding classification. This method utilizing the separability is described e.g. in "Oyo gazo kaiseki (Applied image analysis) (by Jumpei Tsubouchi, Kyoritsu Shuppan Co. Ltd.). This method will be explained, referring to FIG. 1. Suppose now that category groups, which are to be classified (identified), are C.sub.a -C.sub.d and that features prepared for them are F.sub.1 -F.sub.3. The frequency distribution concerning a feature F.sub.1 of a plurality of samples or objects contained in a category C.sub.a, i.e. the mean value .mu..sub.a and the standard deviation .sigma..sub.a, can be obtained. The mean value and the standard deviation of the other categories C.sub.f -C.sub. d are also obtained, as indicated in FIG. 1a. Here, as a feature, e.g. for a circular part, the length of its periphery can be used. Different distribution curves as indicated in FIGS. 1a-1c are obtained due to fluctuations in light intensity, i.e. brightness, on the circular part.
The separability method is one by which values for evaluation of features, called separability, for F.sub.1 -F.sub.3 are calculated and one of them having a large separability is used for the categories C.sub.a -C.sub.d in FIGS. 1a-1c. Here the separability SP (C.sub.k.C.sub.k+1) is represented by Eq. (1): ##EQU1## where
.mu..sub.k : means value of the category C.sub.k for the feature F.sub.i,
.mu..sub.k+1 : mean value of the category C.sub.k+1 which is adjacent to C.sub.k (category whose mean value is next larger),
.sigma..sub.k : standard deviation of the category C.sub.k for feature F.sub.i, and
.sigma..sub.k+1 : standard deviation of the category C.sub.k+1 which is adjacent to C.sub.k.
The separability is calculated for each of the distribution data indicated in FIGS. 1a-1c (e.g. the mean value of the category C.sub.a for feature F.sub.1, .mu..sub.a =1 and the standard deviation .sigma..sub.a =1 are shown in the figure). For example, 3 values of the separability of F.sub.1 for each category are obtained for 3 pairs of categories, which are adjacent to each other, C.sub.a -C.sub.b, C.sub.b -C.sub.c and C.sub.c -C.sub.d. Among them the separability data having the largest value is obtained as follows: ##EQU2## The value of the separability for each of the features F.sub.1, F.sub.2 and F.sub.3 is given in FIGS. 1a-1c.
According to this separability method, the separability is large, in the case where one category such as C.sub.d in the feature F.sub.1 is far away from the distribution groups for the other categories. Consequently, in the case where a decision tree dictionary is prepared by the separability method, at first the feature F.sub.1 is predominant.
Since it is between C.sub.c and C.sub.d that the distribution can be divided safely on the axis of F.sub.1, the categories C.sub.a, C.sub.b, C.sub.c and C.sub.d are divided into 2 groups, one consisting of C.sub.a, C.sub.b and C.sub.c and the other consisting only of C.sub.d. Next, the ranking of the features is decided for C.sub.a, C.sub.b and C.sub.c and it is found that F.sub.2 is predominant. Repeating analogous procedures, it is possible to classify the distribution into 4 categories, C.sub.a, C.sub.b, C.sub.c and C.sub.d, which are separated from each other. However, it is clear from FIG. 1c that the distribution can be classified well most rapidly into separate categories, if the classification is effected by using the feature F.sub.3.
According to this separability method, the number of calculation steps for the features becomes too great and calculations for the features take too much time. (This problem is identical for the stability coefficient method and the variance ratio method.)
As stated above, according to the prior art method, such as the separability method, etc., the recognition time increases proportionally to the number of features which are to be extracted from categories before the recognition result is outputted. For this reason, the decision tree dictionary should be so designed that the number of features is as small as possible. Nevertheless, in the prior art methods, the number has not been taken into account at all. Consequently, the decision tree structure and the recognition speed as a result are accidentally determined and shortening of the recognition time has never been intended.
Accordingly, the present inventors have already filed a United States patent application entitled "System and Method for Preparing a Recognition Dictionary" (Ser. No. 687757) wherein a system and method for classifying categories by using a feature giving the largest separation distribution number have been proposed. This separation distribution number method is one by which a decision tree which is shallow in average can be constructed by arranging the categories to be classified in the order of an increased combination number of categories.
However, the object, shortening of the recognition time, has, in general, two aspects; namely one is the shortening of the maximum recognition time, and the other is the shortening of the total recognition time. The maximum recognition time means the largest value of the time which is necessary for recognizing one of the categories. On the other hand, the total recognition time means the time which is necessary for recognizing the total categories. The separation distribution number method, which the inventors have previously proposed, is the one in which principally a shortening of the total recognition time is taken into account. Depending on applications to objects to be recognized, there are many cases where it is desired to shorten the maximum recognition time (or both the total recognition time and the maximum recognition time).