1. Field of the Invention
The present invention relates to recognition of patterns, such as character and speech patterns, and more particularly, to a technique for preparing data for pattern recognition of characters, sounds, etc.
2. Related Background Art
Conventionally, for recognition of handwritten characters, which constitute a type of pattern, one step-by-step procedure utilizes a classification tree to sort patterns into categories.
Since with the conventional recognition method for using a classification tree, to prepare nodes the focus is only on the number of characteristics of individual nodes, the broader aspects of the pattern can not be determined.
In order to make a classification tree for recognition of a pattern having a large amount of characteristics, a method for selecting a characteristic axis at the individual nodes must be employed because of the time required for calculation.
In addition, there is a conventional method, which utilizes an N-gram table and which is employed for sentence recognition, whereby a finite automation is used as a language model for the constitution of sentences, and whereby, based on this model, the pre-probability of the occurrence of a character row is calculated.
In other words, according to this method, a step of calculating, from large-scale sentence database, the probability concerning the continuation of element rows that constitute sentences.
However, for a language, such as Japanese or Chinese, that includes several thousands of character types, a large amount of sentence data is required even to prepare a trigram table (N=3).
If a table is to be prepared using a small amount of sentence data, a reliable shifting probability and an unreliable shifting probability coexist in the table, and a defect occurs.
A conventional method for preparing a classification tree through pre-processing that involves the step-by-step degeneration of a pattern. According to this method, a well balanced classification tree can be constructed for the macro to the micro form of a pattern. As a result, a recognition function that is as close as possible to the recognition ability of human beings can be expected.
However, since this method absorbs modifications of a pattern by using a variety of training patterns, an enormous amount of training patterns is required.
This condition will be explained while referring to FIG. 32.
Suppose that a classification tree is prepared according to the conventional method for the recognition of numerical bit maps ranging from "0" through "9".
A classification tree constructed by the above method is shaped as shown in FIG. 32. Training patterns for three categories, "4", "5" and "6", are present at the fifth branch from the right in FIG. 32.
In other words, broadly speaking, no categories other than the three categories "4", "5" and "6" are available for the training patterns at the fifth branch from the right in FIG. 32.
As an example, consider the processing for the recognition of an entirely new bit map pattern by using the thus provided classification tree. Broadly speaking, all the bit maps shown in FIGS. 41A through 41E have the same shape as the fifth branch from the right in FIG. 32. In other words, when the above explained classification tree is used for recognition of these bit maps, the bit maps are always classified as belonging to categories of "4", "5" and "6". As the result, the bit maps in FIGS. 41A through 41C are correctly identified, but the bit map in FIG. 41D, which is identified, should be rejected, and the one in FIG. 41E is apparently incorrectly identified.
The reason such a defect occurs is that there is no pattern having the category "2" that is shaped like the one in FIG. 41E. This means that for the conventional method, an enormous quantity of training patterns, which include all possible permutations, are required.