The invention relates to a method of processing postal items in which an image is formed of each item, and including its address information, and on the basis of the image of the item and a reference address base, automatic optical character recognition (OCR) is performed on the destination address information.
Postal operators have undertaken a considerable standardization effort towards defining addressing standards and encouraging the use of such standards. Although standardized mail addressing is becoming more and more widespread, and constitutes a large proportion of postal items handled, there nevertheless remains a very large amount of the mail that is handled that has addressing that is not standard and that includes errors, ambiguities, or indeed from which information is missing.
It is known that systems for automatically recognizing postal addresses by OCR operate so as to obtain an unambiguous resolution for the address for the purposes of sorting within a postal delivery round or “postman's” walk. This recognition operation is performed with an adjustable error rate that has an influence on the extent to which an unambiguous resolution is found, and as a result, on a batch of items, there will be some that are set aside by the automatic recognition process because of the ambiguous result of the resolution. Such items that are set aside or rejected by the automatic recognition processing need to be taken up by a video coding station and/or to be inserted manually into delivery rounds. The proportion of items that are set aside by an automatic OCR process defines a rejection rate at a level that is set on the error level fixed by the postal operator and on the basis of which the error rate is set.
Automatic recognition of address information requires detailed knowledge of the structure of the address block and the style rules used by the clients of postal operators. In order to enable an unambiguous resolution to be found based on a postal directory or on a reference address base, the postal address for recognition must have all of its components placed in an order that is correct, logical, and matches the reference address base.
A destination address typically comprises a street name, a number in the street, a town name, a post code, and a country.
Automatic OCR on a postal item conventionally comprises a plurality of successive steps:                forming a digital image of the postal item including the address information;        binarizing the digital image of the item that includes address information;        segmenting the binarized image in order to locate the address block;        analyzing the address block syntactically in order to subdivide it into address components (strings of characters allocated to different address headings (street number, street name, post code, town, door number, company, country, etc. . . . ); and        analyzing the address components semantically by comparison with the reference address base (postal directory) in order to obtain an unambiguous resolution.        
In the last step of resolving the address, a choice is made from a set of potential address solutions, selecting that address which has the best statistical match with the reference address base. This step of resolving the address is generally subdivided into a step of resolving outward addressing information (country, town, post code) and a step of resolving inward addressing information (street number, street name, door number, etc. . . . ). In both of these two resolution steps, a search is made for a statistical match between the reference address base and a destination address solution is issued when the statistical match level is greater than a predetermined statistical threshold as defined by the error rate. Otherwise, the item is set aside by the automatic recognition processing, as mentioned above.