There are many instances where it would be useful or desirable to provide a computer readable form of a document not available in a compatible computer readable form. Normally it is the case that the document is not available in machine readable form. This could be because the document was handwritten or typewritten and thus no computer readable form exists, or it could be because the computer readable form is not available. In some instances this is a "foreign" document, i.e., a computer readable form does exist but document was produced on an incompatible computer system. In some instances, such as facsimile transmission, a simple optical scan of the document can produce the required form. In most instances the form most useful for later use and decision making is a separate indication of each character of the document.
The field of optical character recognition deals with this problem. The optical character recognizer scans the document in some fashion to produce an electrical indication of the marks of the document. A computer analyzes this indication of the marks to produce an indication of each character of the document. It is within the current state of the art to produce relatively error free indication of many typewritten and printed documents. The best systems of the prior art are capable of properly distinguishing several differing type fonts and of reading kerned text. This is not the case for unconstrained handwritten characters.
The problem of properly reading unconstrained handwritten characters is difficult because of the great variability of the characters. One person may not write the same character exactly the same every time. The variability between different persons writing the same character is even greater than the variability of a single person. In addition to the variability of the characters themselves, handwritten text is often not cleanly executed. Thus characters may overlap horizontally. Loops and descenders may overlap vertically. Further, the individual written lines may be on a slant or have an irregular profile. Thus recognition of handwritten characters is a difficult task.
An example of a field where recognition of handwritten characters would be very valuable is in mail sorting. Each piece of mail must be classified by destination address. A large volume of typewritten and printed mail is now read and sorted using prior art optical character recognition techniques. There remains roughly 15% of current U.S. mail that is hand addressed. Present technology uses automated conveyer systems to present these pieces of mail, one at a time, to an operator who views the address and enters a code for the destination. This is the most labor intensive and slowest part of the entire mail sorting operation.
Sorting of handwritten mail is an area having a unique set of characteristics. First, because of the problem of user acceptance it is not feasible to place further constraints on the address. Thus address lines or individual character boxes, which would be useful in regularizing the recognition task, are ruled out. On the other hand, there already exists a relatively constrained portion of the current address. The 5 digit ZIP code employed in a most handwritten destination addresses provides all the information needed for the primary sorting operation. This information is relatively constrained because the ZIP code consists of only digits. In addition the ZIP code is usually located at the end of the last line of the destination address, thus reducing the task of proper location.
It would thus be useful in the art to provide a manner of normalizing unconstrained handwritten digits to ease the task of recognition.