The present invention relates to automatic mail processing and more particularly to a method of exploiting mail stream statistics to improve optical character recognition.
In the United States a large and always growing volume of mail is processed on a daily basis. Although recent hardware and software advances have been made in optical character recognition (OCR) and these advances have improved overall mail throughput, further improvements are desirable in attempting to achieve the economic benefits that would flow from a complete and fully automated bar-coding system.
In conventional OCR methods for processing letter mail and assigning a bar code, an address block location must first be found. Next, the address is processed by a segmentation function whose ultimate goal is to separate each line into individual characters. The recognition process then attempts to identify each pertinent character. If a zip code is read incorrectly and cannot be verified with a database search, a bar code cannot be assigned and manual processing is typically required.
Problems that occur in current address interpretation methods are that they either assign an incorrect zip code or they do not assign a zip code at all. The first problem occurs when a word break is not present at the start or end of the zip code, or a word break has been placed in the middle of a zip code. The second problem occurs when one or more of the correct digits of the zip code are not ranked as the first choice by the recognition process and are therefore not selected.
While statistical analyses focusing on individual mail pieces have been done, the statistics of typical mail streams has not been exploited.
It is an object of the present invention to provide an automated mail processing method which reduces the amount of mail which must be manually processed.
It is another object of the present invention to provide an automated mail processing method which takes advantage of the statistics of the mail stream being processed to improve OCR recognition rates.