The present invention relates to a picture processing technique in a pattern recognition field; and, more particularly, to a method for recognizing multi-language printed documents.
Most of general documents have been drawn up by different characters, including multi-language, such as Korean, English and Chinese, together with unique marks and figures. Accordingly, it is very important to extract proper features to these different characters in recognizing these different characters, which are included in the documents.
Feature extraction systems for a single language have been developed and multi-fonts are introduced in this picture processing technique. However, the conventional feature extraction systems for such a single language can not recognize the multi-languages having various features on their fonts. Further, a method for recognizing multi-language printed documents, which uses both a letter portion and a background portion in the type of mesh of a predetermined standard as one feature for extraction, has been not introduced.
It is, therefore, an object of the present invention to provide a method for recognizing multi-language printed documents having different styles of fonts.
It is another object of the present invention to provide a method improving a recognition rate by extracting a geometrical feature in both a letter portion and a background portion in the type of mesh.
In accordance with an aspect of the present invention, there is provided a method for extracting character features for recognizing characters, the method comprising the steps of: a) normalizing the characters to a fixed size; b) converting the size-fixed characters into mesh-type characters; c) extracting stroke features of each of the mesh-type characters; d) extracting non-stroke features of each of the mesh-type characters; and e) extracting the character features using the stroke features and the non-stroke features.
In accordance with another aspect of the present invention, there is provided a method for extracting character features for recognizing characters, the method comprising the steps of: i) inputting the characters into an input means; ii) printing the input characters and scanning the printed characters to make character pictures; iii) constructing a standard input character set using the character pictures; iv) normalizing the character pictures to a fixed size; v) converting the size-fixed characters into mesh-type characters; vi) extracting stroke features of each of the mesh-type characters; vii) extracting non-stroke features of each of the mesh-type characters; and viii) extracting the character features using the stroke features and the non-stroke features.