1. Field of the Invention
The present invention relates to an image recognition device and an image recognition method suitable for recognizing characters and other images.
2. Description of the Related Art
Recently, recognition devices using a neural network have been developed, one of the most important applications thereof being character recognition. Mori, Yokozawa, Umeda: "Handwritten KANJI character recognition by a PDP model" in Technical Research Reports of the Association of Electronic Communications Engineers, MBE87-156, pp. 407-414 (1988), and Kunihiko Fukushima: "Neocognitron: A Hierarchical Neural Network Capable of Visual Pattern Recognition", vol. 1, pp. 119-130, Neural Networks (1988), which are incorporated herein by reference.
FIG. 12 shows a block diagram of a conventional character recognition device that uses a neural network. A character input section 60 photoelectrically converts an image pattern of a character into character data, and then outputs the character data to a recognition section 61. The character data is a two-dimensional bit image as is shown in FIG. 13. The recognition section 61 processes the character data by a neural network to recognize the character. The recognition section 61 then outputs the recognition result to a memory section 62 or to a display section 63.
The operation of the neural network used by the recognition section 61 will be described with reference to FIG. 14. The character data 64 generated by the character input section 60 is input to corresponding neurons 66 of an input layer 65. The character data 64 received by the neurons 66 is sent to all neurons 69 in an hidden layer 68 through pathways 67 referred to as synapses. It is important to note that the character data 64 is weighted before being input to the neurons 69. The weighting value is referred to as "synapse weight". The neurons 69 in the hidden layer 68 calculate the sum of all input data, and output the result which is obtained by applying a nonlinear function to the sum. These output results are input to all neurons 72 in an output layer 71 Through the synapses 70 after being weighted with the synapse weights. The neurons 72 in the output layer 71 calculate the sum of all input data, and output the result to a maximum value detection section 73. The maximum value detection section 73 obtains the maximum value of all the data sent from the neurons 72 in the output layer 71, and outputs the character corresponding to the neuron that outputs the maximum value as the recognition result to either the memory section 62 or to the display section 63.
The synapse weights used in the above process are determined by learning referred to as error back propagation. For example, in the case when alphabets are to be recognized, alphabets are sequentially input into the neural network, and the learning is continued until a desired output result is obtained, thereby determining the synapse weight. The neural network is trained in several different styles and fonts of characters to further improve the recognition rate.
Obtaining the feature of an image is referred to as feature extraction. In recognizing an input character, the conventional character recognition device extracts a feature from the input character in the character input section 60. Then, the device inputs the feature into the neural network of the recognition section 61 as an input signal so as to recognize the input character. The recognition ability depends on what kind of features are extracted from the input character by the character input section 60.
The conventional character recognition device simply uses a mesh feature of the character data generated by the character input section 60. Namely, the device recognizes an input character based on a binary bit image or a density value normalized in the range of 0 to 1. The binary bit image is generated from a density value by use of a certain threshold level. In short, the recognition is performed based on which parts of the character data are black and which parts of the character data are white. Accordingly, characters which are different in shape or positionally displaced from the characters beforehand learned by the neural network cannot be correctly recognized. In order to improve the recognition rate, the neural network is required to learn about several tens of different styles and fonts of the characters. However, the recognition rate is still approximately 90% even when the application is limited to the recognition of printed figures, and the learning is significantly time-consuming.
Especially, it is much more difficult to recognize characters having different sizes from the characters learned by the neural network. In order to overcome such a difficulty, the conventional character recognition device calculates the center of gravity of an input character, and makes the character date by normalizing the size of the input character. However, since such a normalization process is extremely time-consuming, high speed recognition is impossible.
There are similar problems in the recognition of two-dimensional images other than characters.