1. Field of the Invention
The present invention relates generally to a character recognition system for recognizing characters on a master document. More specifically, the invention relates to a character recognition system which is specifically adapted to recognize characters having mutually separated character components, such as "Chinese" characters, Japanese "Hiragana" and "Katakana" characters and so forth. Also, the invention relates to a character recognition system which is suitable for picking-up character data from a master document containing a mixture of Japanese or Chinese characters and alphabetic characters, such as English, German and so forth.
2. Description of the Background Art
In recent years, there have been developed and proposed various character recognition systems for picking up character data from a master document. In such character recognition systems it is difficult to pick up character data from a master document written in Japanese, Chinese and so forth. The difficulty is due to the presence of some characters in Japanese, Chinese or other equivalent languages, which have disjoint or mutually separated character components. For example, the Japanese Kanji character which means a river, has three substantially vertically extending and mutually separated character components and the Japanese Hiragana character " " to be pronounced "i" has two substantially vertically extending and mutually separated characters. Throughout the present application, characters having mutually separated character components will be referred to as "separating characters".
A character recognition system generally extracts or segments each character on the master document and compares the character structure with pre-set data to recognize the character. The recognized character is usually converted into a computer applicable code, such as ASCII code and so forth. Accurately extracting the separating character has been very difficult because of the presence of a space or discontinuity between the character components.
On the other hand, in English, German or other alphabetic languages, the space between the characters in a word is substantially narrower than that between the words. Because of the narrow spacing between the alphabetic characters in the word, when an alphabetic character document is read by a character recognition system which is designed for scanning Japanese or Chinese character documents, the space between the characters in the word tends to be ignored. This results in picking up of an image of the overall word as a unit. This makes it impossible to recognize each alphabetic character in the document.
This problem in recognizing characters is especially severe when the character recognition system is used for reading and picking up character data from a master document containing a mixture of both Japanese or Chinese characters and alphabetic characters.
Furthermore, in the prior proposed systems, the extraction of the character to be recognized and recognition of the character are performed in mutually independent steps. Generally, the step of extracting characters is performed in advance of the step of recognizing the character. When the structure of the extracted character does not match any of the pre-set character patterns, the character is treated as a non-recognizable character. This significantly lowers the character recognition rate of the character recognition system.