The invention relates to a pattern recognition apparatus and a pattern recognition method and in particular to a pattern recognition method that combines different recognition modules to improve the recognition accuracy.
The demand for information processing involving pattern recognition is currently large and is rapidly increasing. Pattern recognition is used in such applications as image processing, text processing, and sound processing performed by computers. Consequently, improvements in pattern recognition technology are strongly desirable.
Pattern recognition is the process by which a physical phenomenon, such as an image, hand written or printed character or a sound, is converted to an electronic signal representing the phenomenon, a determination is made of which one of a number of possible categories in a category set the phenomenon belongs to, and a code indicating the determined category is generated. For example, in character recognition, an unknown printed character, such as the letter xe2x80x9cA,xe2x80x9d may be scanned by an electronic scanner. The scanner generates an electronic pattern signal that represents a pattern composed of an array of a few thousand bytes that represent the unknown character. The pattern signal is then analyzed to determine to which of the categories in the category set corresponding to the letters A-Z the unknown character belongs. A code identifying this category as the category of the unknown character is then generated. For example, the ASCII code 33 representing the letter A may be generated.
Pattern recognition processing is preferably performed using features of the pattern extracted from the pattern signal instead of using the raw pattern signal. Processing the features extracted from the pattern signal is preferable because these features can often be processed faster, more accurately and cheaper than the raw pattern signal. If pattern signals containing extremely large quantities of information are processed, features are sometimes extracted, and the features must be processed. One objective of pattern recognition is to compress the information by representing the patterns using the features extracted from the pattern signal. Of course, the features must be extracted from the pattern signal in a way that does not impair the ability of the pattern recognition processing to recognize the pattern.
The feature f of a pattern p is usually defined by a set {x(p;m); m=1,2,3, . . . , M} of a finite number M of feature components x(p;m). The feature f tangibly and quantitatively represents the characteristic qualities of the pattern. Consequently, since the feature f is represented by an M-dimensional vector whose m-th component is the feature component x(p;m), the vector representation of the feature f is the feature vector X(p)=(x(p;1), x(p;2), . . . , x(p;m))t. The argument p indicates the feature vector X(p) is the feature vector of the pattern p. The superscript t denotes vector transposition.
Even though the feature components are qualitative, they can be quantified and used.
If the pattern p undergoes various deformations, the value of the feature component x(p;m) changes. Consequently, the feature vector X(p) changes. However, as long as the deformed pattern belongs to its original category, the pattern recognition process must recognize it as belonging to that category.
A particular pattern that specified as being representative of the patterns belonging to a particular category or as being representative of a feature of the category is called the reference pattern of the category. The feature vector of the specified pattern is called the reference vector of the category. As an alternative to using a particular pattern as the reference pattern for the category, a hypothetical pattern obtained by averaging the patterns belonging to the category can be used as the reference pattern, and the feature vector of such hypothetical pattern can be used as the reference vector of the category.
In pattern recognition, an unknown pattern p is received and a pattern recognition process is performed. The pattern may determine whether the unknown pattern is similar to a known pattern q, or to determine which category the unknown pattern belongs to. Pattern recognition is essential in recognizing diagrams, characters, symbols, images, and sounds. General information about pattern recognition and the problems of pattern recognition can be found in Reference 1, Kazuo Nakada, Ed.: Pattern Recognition and its Application, Corona Co. (1978) (in Japanese) and Reference 2, Hidemitsu Ogawa, Ed.: New Developments in Pattern Recognition and Understandingxe2x80x94The Challengesxe2x80x94Denshi Jyoho Tsushin Gakkai (1992) (in Japanese).
Examples of pattern recognition in which the unknown patterns are character patterns will described below on the understanding that the principles set forth in the description can easily be applied to other forms of pattern recognition, such as image recognition and sound recognition. Character patterns are patterns representing letters, numbers, Kanji characters and the like. A pattern representing a character will from now on be referred to as a character pattern. Examples of possible feature components of character patterns include:
the length-to-width ratio of the character,
the number of horizontal lines,
the number of loops,
whether each square of a grid overlaid on the character is black or white,
the number of crossing points with a straight line in a specific direction, and
the transform coefficients of a Fourier transform of the character pattern.
A set of feature components such as that listed above is used to construct the feature vector so that the resulting feature vector can optimally represent the characters in the character set. The dynamic range of each feature component is selected to improve the accuracy of the pattern recognition to be described later. The feature component may be normalized using the standard deviation when this is needed.
Pattern recognition generates a category name for each character pattern subject to pattern recognition. The category code represents the reading, meaning, or code of the character pattern. For example, the category name of the category to which the character xe2x80x9cAxe2x80x9d belongs may be xe2x80x9ccategory A.xe2x80x9d As noted above, a specific character pattern belonging to the category is selected as the reference pattern for the category. Alternatively, a hypothetical pattern obtained by averaging a number of character patterns belonging to a category may be used as the reference pattern. The feature vector of the reference pattern is adopted as the reference vector of the category.
At the heart of pattern recognition is a recognition processor that has the objective of determining that all unknown character patterns that represent the character xe2x80x9cA,xe2x80x9d belong to category A, irrespective of whether the character pattern is deformed, and, further, that such character patterns do not belong to categories other than category A.
The processing performed by a character recognition apparatus after character pattern observation and reading is usually divided into a series of process modules that perform character pattern preprocessing, feature extraction, and recognition. Each process module can primarily be implemented using a computer and is realized by the computer performing a specific set of operations. All of the process modules, including observation of the character pattern, affect the result generated by the recognition module.
The strategy for increasing the accuracy of character recognition is to maximize the recognition ratio and to reduce the misrecognition ratio to zero. The recognition ratio is the fraction of character patterns that should belong to each category that are correctly recognized as belonging to that category. The misrecognition ratio is the fraction of characters patterns that do not belong to each category that are misrecognized as belonging to that category. In particular, many applications strongly demand that misrecognition does not occur, i.e., character patterns that do not belong to category A, for example, must not be allocated to category A.
Each input pattern can be regarded as a warping or a modification of a corresponding ideal pattern. Such modification can be regarded as an inverse form of pre-processing in the sense that the modification degrades the ability of the pattern recognition apparatus to recognize the pattern. This negative pre-processing must be one of the target factors taken into consideration when attempting to improve the performance of a pattern recognition apparatus for character patterns.
Known approaches to improving the recognition accuracy of a pattern recognition apparatus have added feedback control functions, such as adjusting the processing performed by each process module based on the recognition result, and have made various changes to the processing performed by the recognition module. The character pattern pre-processing referred to above includes normalizing the size, tilt, position, and line density of an input character pattern, and removal of noise such as spots and scratches.
Feature extraction referred to above analyzes an input character pattern to determine the values of the feature components that represent the shape and other characteristics of the character represented by the character pattern. The feature vector of the character pattern is constructed from the feature components. The number of feature components is typically of the order of several hundred, for example, 380 or 500, but may be as large as several thousand. One way of generating the feature vector of a character pattern is to hypothetically overlay the character pattern with a 64xc3x9764 grid and then determine whether each square of the grid is filled by a stroke of the character pattern. If the square is filled, the value 1 is allocated to the square. Otherwise, the value 0 is allocated. Each square of the grid may be regarded as a feature component, and a 4,096-dimensional feature vector whose elements are 0 and 1 is generated as the feature vector of the character pattern.
In another example, another effective feature in character recognition is a combination of, for example, the number of end points in the strokes (2 in the letter A), the number of loops (1 in the letter A), the number of bending points (1 in the letter A), the number of branch points (2 in the letter A), the number of crossing points (0 in the letter A), and their positions. Then, the smallest possible number of feature components consistent with a required recognition accuracy is selected, and a feature vector with the corresponding number of dimensions is constructed. Although increasing the number of dimensions of the feature vector tends to increase the recognition accuracy, increasing the number of dimensions increases the processing time and the memory capacity required.
In the recognition module referred to above, the feature vector extracted from each input character pattern is matched to the reference vector of each category in the category set to determine the category to which the input character pattern belongs. The reference vectors are determined before the character recognition processing starts and are stored in a recognition dictionary. The reference vectors are specific to the character set to which the unknown character belongs and to the particular way in which the character recognition is performed. The recognition dictionary includes a set of reference vectors, at least one reference vector for each category in the category set. Recognition modules often use a similarity function or a distance function as a recognition function to determine the category to which the unknown character pattern belongs. The distance between the character pattern and the reference pattern can be regarded as the distance between the character pattern and the category, as represented by the reference pattern. In particular, the distance between the feature vector of the character pattern and the reference vector of the reference pattern can be regarded as the distance between the character pattern and the category.
When the recognition module uses a distance function, a way of measuring the distance between character patterns is defined, and the distance between the input character pattern and each category is measured. Generally, the shorter the distance, the more similar are the character patterns. An input character pattern whose feature vector is within a fixed distance of the reference vector of a certain category can be said to belong to that category. Alternatively, a character pattern can be said to belong to the category to which the closest reference feature vector belongs.
When the feature vector of the input character is defined as in the example described above, the recognition module computes the Hamming distance (number of mutually different elements) in the 4,096-dimensional feature vector whose elements are 1 and 0. An input character is said to belong to the category whose reference vector is within a Hamming distance of 100 from the extracted feature vector of the input character.
Hamming distances are calculated between the feature vector extracted from the input character pattern and the reference vectors of all the categories in the category set. The category for which the Hamming distance is shortest is determined to be the category to which the input character pattern belongs.
Additional conditions may be imposed before an input character pattern is said to belong to the category whose reference vector is the shortest distance from the feature vector of the input character pattern. Examples of such additional conditions are that the shortest Hamming distance must be less than a first threshold value, and that the increment between the shortest Hamming distance and the second-shortest Hamming distance must be greater than a second threshold value.
In other examples of a highly accurate character recognition process, the recognition module may use as its recognition function a distance function that determines a Euclidean distance, a distance function that determines a weighted Euclidean distance, or a quadratic discriminant function. It is known that the recognition accuracy of such recognition functions can be increased by using a training process to improve the discriminant function and the character recognition dictionary. The Learning by Discriminant Analysis (LDA) method described by the inventor in Handprinted Numerals Recognition by Learning Distance Function, Trans. of the IEICE, Vol. J-76-D-11, No. 9, pp. 1851-59 (Reference 3), takes into account the deformation of character patterns by training the discriminant process. This reference additionally describes conventional character recognition processes in some detail, so a review of this reference provides a good basis for better understanding the invention disclosed below.
Learning by Discriminant Analysis uses Fisher""s linear discriminant analysis. Part of the LDA method will now be described using symbolic representations that differ from the symbolic representations used in Reference 3 to be discussed below. In Learning by Discriminant Analysis, a distance function that gives a weighted Euclidean distance is trained and stored in the recognition dictionary. Specifically, the weighting vectors and constant terms of the distance function and the reference vectors are learned.
A known input character pattern p is input to the recognition apparatus as a training pattern and is first subject to conventional preprocessing and feature extraction to obtain the feature vector X(p)=(x(p;1), x(p;2), . . . , x(p;m), . . . , x(p;m))t.
The reference vector R(K)=(r(K; 1), r(K;2), . . . , r(K;m), . . . , r(K;M))t of each category K in the category set {K} is given, and the weighted Euclidean distance D(p,K) between the input character pattern X(p) and the reference vector R(K) of each category in the category set is calculated. The weighted Euclidian distance is calculated using:                               D          ⁡                      (                          p              ,              K                        )                          =                              D            ⁡                          (                                                X                  ⁡                                      (                    p                    )                                                  ,                                  R                  ⁡                                      (                    K                    )                                                              )                                =                                    ∑                              m                =                1                            M                        ⁢                          xe2x80x83                        ⁢                          ω              ⁢                              xe2x80x83                            ⁢                              (                                  K                  ;                  m                                )                            ⁢                                                                    (                                                                  x                        ⁡                                                  (                                                      p                            ;                            m                                                    )                                                                    -                                              r                        ⁡                                                  (                                                      K                            ;                            m                                                    )                                                                                      )                                    2                                .                                                                        (        1        )            
To be precise, xcfx89(p,K) gives the square of the distance, but it will simply be called the distance here.
In this, xcfx89(K;m) is the m-th weighting factor and is one element of the weighting vector W(K)=(xcfx89(K;1), w(K;2), . . . , w(K;m), . . . , (K;M))t. The character recognition dictionary L({K}) stores the parameters needed to perform the recognition operation. Typical parameters include the reference feature vectors R(K) and the weighting vectors W(K). {K} indicates the entire set of categories of which the category K is a member and to which the input character can belong. During the matching operation, the reference dictionary provides a reference vector for each category in the category set. Consequently, the recognition dictionary related to the category set {K} is designated as L({K}).
Although the above format of the distance function is used to determine the weighted Euclidian distance for all of the categories in the category set, the parameters used in the distance function are set to specific values for each category. The distance function after learning by LDA differs from a conventional distance function based on the usual definition satisfying the distance formula. The LDA-based recognition function increases the differences between patterns.
In the following description, the category to which the input character pattern p actually belongs will be designated by K(p). In the LDA-based pattern recognition operation, the learning operation is performed using training patterns, i.e., character patterns whose categories are known. The modified Euclidian distance D(p,K) between each input character pattern and each category is determined for each category K in the category set. The recognition module determines that the input character pattern p belongs to the category K1(p) for which the determined modified Euclidian distance D(p,K) is smallest. However, in some circumstances, the determined category K1(p) will differ from the category K(p) to which the input character pattern p actually belongs. In this case, the input character pattern is misrecognized. In other circumstances, the increment between the distances D(p,K1) and D(p,K2) between the input character pattern on one hand and the category K1 and the category K2 on the other hand is small. The category K2 differs from the category K1. For such character patterns, the recognition result K(p) cannot be said to be accurate with a high degree of confidence. A character pattern that is either incorrectly allocated to a category, or that cannot be said with a high degree of confidence to belong to a category, can be characterized in one of the following ways:
(1) A character pattern that actually belongs to a category different from category K but that the recognition module misrecognizes as belonging to the category K will be called an error pattern poe belonging to the category K.
(2) A character pattern that actually belongs to a category different from category K but that the recognition module nearly misrecognizes as belonging to the category K will be called a near-miss pattern pon belonging to the category K.
The terms poe and pon are the terms used in by the inventor Reference 3, referred to above and describe two types of rival patterns por that constitute the rival pattern set of the category K. In LDA, the weighting vectors and constant terms and the reference vectors are learned so that all rival patterns por are kept at least a minimum distance from the category K. The learning process can be performed for all of the categories or only for those categories for which misrecognition most easily takes place.
The rival pattern set of the category K is designated by xcexa9r(K) and the in-category pattern set of the category K be designated by xcexa90(K). The in-category pattern set of the category K is composed of the training patterns defined as belonging to the category K. The learning process determines the coefficients {a(m); m=1 to M}, {b(m); m=1 to M}, and c so that these coefficients make the discriminant function F(X(p),R(K)) given by the equation below negative when the training pattern is a member of the in-category set xcexa90(K) and positive when the training pattern is a member of the rival pattern set xcexa9rR(K). During this processing, the average of the feature vectors of the training patterns that belong to the in-category pattern set of the category K is used in the reference vector R(K) of the category K. The discriminant function is calculated using:                               F          ⁡                      (                                          X                ⁡                                  (                  p                  )                                            ,                              R                ⁡                                  (                  K                  )                                                      )                          =                                            ∑                              m                =                1                            M                        ⁢                          xe2x80x83                        ⁢                          a              ⁢                              xe2x80x83                            ⁢                              (                m                )                            ⁢                                                (                                                            x                      ⁡                                              (                                                  p                          ;                          m                                                )                                                              -                                          r                      ⁡                                              (                                                  K                          ;                          m                                                )                                                                              )                                2                                              +                                    ∑                              m                =                1                            M                        ⁢                          xe2x80x83                        ⁢                          b              ⁢                              xe2x80x83                            ⁢                              (                m                )                            ⁢                              (                                                      x                    ⁡                                          (                                              p                        ;                        m                                            )                                                        -                                      r                    ⁡                                          (                                              K                        ;                        m                                            )                                                                      )                                              +                      c            ⁡                          (              K              )                                                          (        2        )            
Since the discriminant function F(X(p), R(K)) is negative for the in-category pattern set xcexa90(K), F(R(K),R(K))=c less than 0.
The discriminant function F(X(p),R(K)) is then weighted by the factor xcex3 and the result is added to the original distance defined above in equation (1). Thus, the distance D(X(p),R(K)) becomes the new distance G(X(p),R(K)) defined as follows:                               G          ⁡                      (                                          X                ⁡                                  (                  p                  )                                            ,                              R                ⁡                                  (                  K                  )                                                      )                          =                              G            ⁡                          (                              p                ,                K                            )                                =                                                    D                ⁡                                  (                                                            X                      ⁡                                              (                        p                        )                                                              ,                                          R                      ⁡                                              (                        K                        )                                                                              )                                            +                              γ                ⁢                                  xe2x80x83                                ⁢                                  F                  ⁡                                      (                                                                  X                        ⁡                                                  (                          p                          )                                                                    ,                                              R                        ⁡                                                  (                          K                          )                                                                                      )                                                                        =                                                            ∑                                      m                    =                    1                                    M                                ⁢                                  xe2x80x83                                ⁢                                                      (                                                                  ω                        ⁢                                                  xe2x80x83                                                ⁢                                                  (                                                      K                            ;                            m                                                    )                                                                    +                                              Δ                        ⁢                                                  xe2x80x83                                                ⁢                        ω                        ⁢                                                  xe2x80x83                                                ⁢                                                  (                                                      K                            ;                            m                                                    )                                                                                      )                                    ⁢                                      xe2x80x83                                    ⁢                                                            {                                                                        x                          ⁡                                                      (                                                          p                              ;                              m                                                        )                                                                          -                                                  (                                                                                    r                              ⁡                                                              (                                                                  K                                  ;                                  m                                                                )                                                                                      +                                                          Δ                              ⁢                                                              xe2x80x83                                                            ⁢                                                              r                                ⁡                                                                  (                                                                      K                                    ;                                    m                                                                    )                                                                                                                                              )                                                                    }                                        2                                                              +                              d                ⁡                                  (                  K                  )                                                                                        (        3        )            
The weighting factor xcex3 in equation (3) is a positive number and is determined experimentally. The value of the weighting factor is selected to maximize the recognition accuracy over all of the categories in the category set {K}. The tests use publicly-available character databases or independently-compiled character databases. Often, the learning operation is performed using a portion of the character patterns in the character database and the remainder of the character patterns are used to verify the result of the learning operation.
As a result of performing the learning operation, the weighting vector, reference vector, and constant term are learned in the format with the added constant term d(K). The new reference vector and weighting vector are designated as:
T(K)=(r(K;1)+xcex94r(K;1), . . . , r(K;M)+xcex94r(K;M))t,xe2x80x83xe2x80x83(4)
and
U(K)=(xcfx89(K;1)+xcex94xcfx89(K;1), . . . , xcfx89(K;M)+xcex94xcfx89(K;M)t,xe2x80x83xe2x80x83(5)
respectively.
The constant term d(K), weighting vector U(K), and reference vector T(K) are stored in the recognition dictionary. Next, discrimination using G(X(p),R(K)), which includes the constant term, is performed, and newly-generated rival patterns result from this discrimination are added to the rival pattern set of each category and the learning process is repeated
In the examples described above, since the feature vectors and discriminant functions are unchanged in type from the original feature vectors, the scope of the learning process is restricted to a portion of the recognition process. This portion includes the contents of the recognition dictionary.
Pattern recognition methods that combine multiple different recognition processes to improve the character recognition accuracy are known. Specifically, combinations of recognition processes that use different features extracted from the pattern are known. The features may differ in their type and the number of feature components in them. Moreover, combinations of recognition processes that use different discriminant functions have been tried. Either of these possible combinations of recognition processes is effective at improving the recognition accuracy. However, conventional ways of using multiple recognition processes simply combine at least two independently-developed recognition processes. There is no indication that more effect measures have been taken such as designing one recognition process to recognize with high accuracy the characters that are not recognized by the other recognition process. As a result, the improvement in the recognition accuracy resulting from using a combination of two conventional recognition processes is limited.
What is needed is a pattern recognition apparatus and method in which the recognition accuracy is improved by integrating two recognition processes that have characteristics designed so that those patterns that cannot be recognized with high reliability by one of the recognition processes are recognized with the highest possible accuracy by the other recognition process.
What is also needed is a pattern recognition apparatus and method in which two recognition processes are integrated in a way that minimizes the percentage of patterns that are correctly recognized by the one of the recognition processes operating alone but are misrecognized by the recognition processes operating together.
Finally, what is needed is a pattern recognition apparatus and method in which two recognition processes are integrated and in which learning by the integrated recognition processes can is easily be implemented.
The invention provides a pattern recognition apparatus that comprises an input section, a feature extraction module, a feature transform module, a recognition section that includes a recognition dictionary and a categorizer. The input section receives input patterns that include a pattern belonging to one of plural categories constituting a category set. The feature extraction module expresses features of the pattern as a feature vector. The feature transform module uses transform vector matrices to transform at least part of the feature vector to generate an at least partially transformed feature vector corresponding to each of the categories. The recognition dictionary stores both matching information and first transformed matching information for each of the categories. The first transformed matching information has been transformed using the transform vector matrices. The recognition section generates at least one difference value for each of the categories by performing a matching operation between at least one matching vector derived at least from the at least partially transformed feature vector corresponding to each of the categories on one hand, and the matching information and the first transformed matching information on the other hand. The categorizer identifies the category to which the pattern belongs in response to the at least one difference value.
One embodiment additionally comprises a reliability determination module, the feature transform module transforms all of the feature vector to generate a transformed feature vector corresponding to each of the categories, and the recognition section includes first recognition module and a second recognition module. The first recognition module generates a first difference value for each of the categories by performing a matching operation between the matching information and a first matching vector derived from the feature vector. The second recognition module generates a second difference value for each of the categories by performing a matching operation between the first transformed matching information and a second matching vector derived from the first transformed feature vector corresponding to each of the categories. The reliability determination module receives the first difference value for each of the categories and indicates when pattern recognition based on the first difference value for each of the categories would be reliable. The categorizer identifies the category to which the pattern belongs in response either to the first difference value for each of the categories alone for each of the categories or to the first difference value and the second difference value for each of the categories. The categorizer identifies in response to the first difference values alone when the reliability determination module indicates that pattern recognition based on the first difference values would be reliable.
The invention also provides a method for recognizing patterns in which input patterns are received and features of the pattern are expressed as a feature vector. The input patterns include a pattern belonging to one of plural categories constituting a category set. A least part of the feature vector is transformed using transform vector matrices to generate an at least partially transformed feature vector corresponding to each of the categories. A matching operation is performed between a matching vector derived from the at least partially transformed feature vector corresponding to each of the categories on one hand, and matching information and transformed matching information for each of the categories on the other hand. The matching operation generates at least one difference value for each of the categories. The transformed matching information is matching information that has been transformed using the transform vector matrices. Finally, the category to which the pattern belongs is identified in response to the at least one difference value.
The transform vector matrices include a transform vector matrix for either (a) a category belonging to the category set, or (b) a category subset composed of plural categories belonging to the category set. The transform vector matrix may be generated by receiving training patterns whose respective categories are defined and expressing features of the training patterns as feature vectors. The categories to which the training patterns respectively belong are identified by performing a matching operation between first matching vectors derived from the feature vectors and the matching information. The categories to which the training patterns are identified as belonging are compared with the respective defined categories to define a rival pattern set for either (a) the category, or (b) the category subset, respectively. An average vector is determined from the feature vectors of all of the training patterns defined as belonging to either (a) the category, or (b) the category subset, respectively. A difference vector is calculated for each of the training patterns belonging to the rival pattern set using the average vector. An autocorrelation matrix of the difference vectors is calculated. Finally, eigenvectors of the autocorrelation matrix are adopted as transform vectors constituting the transform vector matrix either (a) the category, or (b) the category subset, respectively.
The transformed recognition information for a category belonging to the category set may be generated by receiving training patterns whose respective categories are defined and expressing features of the training patterns as feature vectors. The categories to which the training patterns respectively belong are identified by performing a matching operation between a first matching vector derived from the feature vector and the matching information. The categories to which the training patterns are identified as belonging are compared with the respective defined categories to define a rival pattern set for the category. The feature vectors of the training patterns are transformed to generate respective transformed feature vectors using the transform vector matrix for either (a) the category, or (b) a category subset to which the category belongs. The category subset is composed of plural categories belonging to the category set. A discriminant analysis is performed using the transformed feature vectors to generate a discriminant function. A modified difference value is calculated between the training patterns and each of the categories using the discriminant function. The categories to which the training patterns respectively belong are re-identified in response to the modified difference value for each of the categories. The categories to which the training patterns are identified as belonging in response to the modified difference values are re-compared with the respective defined categories to determine whether additional patterns are misrecognized as belonging to the category. The transformed recognition information is generated using the discriminant function when no additional patterns are misrecognized. Otherwise, the additional patterns are included in the rival pattern set of the category, and the discriminant analysis performing, modified difference value calculating, category re-identifying and re-comparing operations are repeated until no additional patterns are misrecognized.