The invention relates to a method for generating learning and/or sample probes for optimizing automatic readers of addresses on items of mail with adaptive classifiers.
A large proportion of the processing steps such as, for example, recognition of characters, words and types of script, occurring in an address reader are based on adaptive classification methods. The basic principle which all the adaptive methods have in common is learning previously collected patterns whose properties are mapped onto quantifiable feature sets. They permit conclusions to be drawn about class membership later. For this reason, adaptive methods basically have two working phases:                a) the optimization phase, composed preferably of the learning phase and test phase,        b) the can phase.        
During the optimization phase, each feature set of a pattern which, depending on the task, is composed, for example, of a character, a word or an address, must have its meaning added to it in the form of the reference information so that the determination variables of the classification system can be set in an optimum way. This phase in which the system moves towards the optimum parameter setting preferably takes place in two stages with the basic setting of the parameters being performed in the learning phase while fine adjustment of the parameters takes place in the test phase. In the can phase, all that is then needed is the feature set of a pattern from which the classification system derives the class membership in accordance with the stored parameters.
The greatest degree of expenditure involved in the technical development of a classification system is incurred in the learning and test phases which can each in turn be divided into two main activities. Firstly, it is necessary to prepare a sample which satisfactorily represents the recognition task. This is followed by the actual adaptation of the classification system which, depending on the classification method and classifier design, concentrates on the optimization of the underlying determination variables such as, for example, optimization of the classifier coefficients for the polynomial classifier, optimization of the weighting factors for the neural network or the selection of the most efficient reference vectors for the nearest neighbour classifier.
While the second aspect of the learning and test phases can largely take place in an automated fashion since it is generally based on well defined mathematical methods and optimization methods, the first aspect entails a large amount of work on planning, research and checking, which often becomes the actual sticking point of adaptive solution methodology.
In order to assemble the samples, according to the prior art large quantities of items of mail (life mail) are collected in situ and provided manually, by so-called labeling, with the reference information (meaning of the addresses, layout data). The original reference information/meaning which has been lost therefore has to be inferred from an image. (Jürgen Schüirmann: Pattern Classification, Verlag [publishing house]: John Wiley & Sons, Inc., 1995, Chapter “Introduction Learning”, pp. 17-21).
The process of assembling the sample is of decisive significance for automatic recognition for a wide variety of reasons since its quality has a direct effect on the efficiency of the subsequently adapted classification system. If the respective sample reflects the reading task under consideration sufficiently well, a good reading performance for the wide range of samples which occur will also occur in the can phase. If too narrow a sample is selected, a good performance can also be expected in the can phase only for this restricted range and the anticipated performance for the rest of the patterns which occur is not achieved. This aspect of the sufficiently comprehensive sample correlates directly to the term of representativeness of a sample from the mathematical statistics.
In order to obtain a high quality and representative sample a series of criteria have to be fulfilled. A basic precondition for a good learning and test sample is that all the forms of a pattern class which have to be learnt are present to a sufficient degree. This is often already a condition which is difficult to fulfil since task definitions usually come from a specific application which represents only a portion of an overall recognition task. For example, in the region of script detection in the field of mail certain fonts, printing techniques or printing equipment which represent only a limited portion of the entire range have preference at the time when a classifier is adapted. In the course of the service life of a device for reading the addresses on items of mail other fonts and printing techniques will probably come to predominate and must nevertheless still be sufficiently well recognised. This aspect often varies when such techniques are used in different national areas. In a country with a high level of technology, the fonts and printing equipment and writing equipment which are used will be entirely different from that in a developing country. This requires the sample to be collected in as far-cited a way as possible and necessitates as wide a basis as possible for the generation of patterns.
In applications for items of mail it is often impossible to find sufficient examples for a specific task definition, for example rarely occurring characters, for example “Q” in the German language or a rare company logo. Categories from the field of postal applications are quickly formulated and corresponding algorithms quickly generated but it is frequently impossible to check them in a meaningful way this sorting of existing stocks of the sample do not contain any examples of the required class at all or do not require a sufficient number of them.
Next, the true meaning which is assigned to a pattern must apply. If, in fact, an adaptive system too frequently assigns the false class memberships to a pattern, it will increasingly make the wrong decision in the can phase if corresponding patterns are presented. The system is simply adaptive and also learns incorrect things if they are offered to it. The smaller the incorrect detections in the learning or test sample, the better the efficiency of the developed classification system also.
A further aspect relates directly to the generation of the feature sets. The feature sets are usually generated with the detection algorithms which are contained in the reading software which is present since the amounts are still considerable (for example several thousand examples per character in the case of character recognition), and the features are to be as close to conditions in reality as possible. However, the algorithms which are present do not operate without faults. For example, during character segmentation incorrect segments occur which, instead of containing one character, contain only parts of characters or contain more than one character or even sometimes contain only interference which is not only irrelevant for an adaptation but is also hugely disruptive since they are very confusing for the classification system.
Furthermore, within a pattern recognition process an entire series of processing steps occur which are not visibly determined and cannot be perceived visibly but rather have to be handled in a summary statistical fashion. This includes, for example, quantisation effects as a result of binarisation process, contrast variations as a result of different colored paper backgrounds, rounding effects as a result of different resolution algorithms and scanning algorithms in scanning and printing equipment and scanning and printing quality fluctuations as a result of age and different maintenance states of the equipment.
All this explains the previous difficulties and the large amount of effort dedicated to optimizing automatic readers of addresses on items of mail.