The present invention is directed generally to optical character recognition, and more particularly, to automated methods and apparatus for reducing recognition errors, especially those resulting from an inability to distinguish between upper and lower case characters, or other characters of similar shape but dissimilar size or position.
Programmable computers and digital processing apparatus have proven useful for optical character recognition in which visual character indicia, such as printed text, are scanned, identified and assigned a character code value that can be stored electronically. A word processing "document" is one example of a data file containing character code values which a computer is able to interpret and reproduce in human-readable form on a CRT or as a printed document. There are many character code conventions in use today, the most common being the ASCII (American Standard Code for Information Interchange) code system.
Many existing optical character recognition systems make recognition errors between characters that are very similar in shape but are of different size or located at different positions. Upper and lowercase characters (S/s) and apostrophes and commas ('/,), for example, are prone to such errors. No matter how similar the shapes, their size or position is usually so different that this kind of error must be avoided.
Although some optical character recognition systems utilize information on size and position for discrimination, they still suffer from recognition errors, particularly when encountering the many kinds of electronic fonts currently in use. Thus, it would be desirable to provide a system that utilizes size and position information in a fashion that enhances the speed and recognition rate of optical character recognition.