The present invention relates to a character recognition system.
In a conventional character recognition system to read out the image of characters written on paper and produce the read-out characters as character codes capable of being processed on a computer, a character area is extracted from the document image, each of the characters is extracted from the extracted character area, and character recognition is executed for each division area.
In such a conventional character recognition system, since the character area is extracted before the character recognition process there may be errors in the extraction of the character due to causes such as blots and blurs of the image. If such errors occur, it is impossible to restore or compensate for the error, reducing the character recognition rate.
It is therefore an object of the present invention to provide a character recognition system having an improved recognition rate.
According to the present invention, there is provided a character recognition system comprising: feature extraction parameter storage means for storing a transformation matrix for reducing a number of dimensions of feature parameters and a codebook for quantization; HMM storage means for storing a constitution and parameters of a hidden Markov Model (HMM) for character string expression; feature extraction means for scanning a word image given from an image storage means from left to right in a predetermined cycle with a slit having a sufficiently smaller width than the character width and thus outputting a feature symbol at each predetermined timing, and matching means for matching a feature symbol row and a probability maximization HMM state, thereby recognizing the character string.
According to another aspect of the present invention, there is provided a character recognition system comprising: an image scanner; an image information storage means for storing document image data read out from the image scanner and pertaining image area information; a feature extraction parameter storage means for storing a transformation matrix to reduce a number of dimensions of feature parameters and a codebook to obtain feature symbols, the transformation matrix expressing each feature parameter which comprises multivariates as a small number of variates to minimize information loss, and previously calculated from a training sample feature parameter through main component analysis, the codebook being a set of codevectors used for quantization to express the transformed feature parameter with low bits, and previously calculated from the training sample feature parameter; a character string HMM storage means for storing a constitution and parameters of a character string HMM expression, the character string HMM being obtained by preparing one HMM for each character and adding a state transition from the completion state of each character to an initial state of each character HMM; an image extraction means for extracting an area with characters written therein from the document image and extracting rows and words from the extracted character area and storing word image area information thus obtained in the image information storage means; a feature extraction means for scanning the image of each word area from left to right in a predetermined cycle with a slit having a sufficiently smaller width than the character width and producing a feature symbol at each predetermined timing, and a matching means for making correspondence of the feature symbol sequence with a HMM state so as to maximize the probability of recognizing character string by utilizing the resultant optimum state transition sequence.
Other objects and features of the present invention will be clarified from the following description with reference to the attached drawings.