1. Field of the Invention
The invention relates in general to processes for recognition of a pattern on an item presented. In particular, it relates to a process involving a continuous pattern adaptation. The invention further relates to equipment to execute such a process.
2. Background Art
The recognition of characters or character patterns on presented items represents a function of ever-growing importance in the modern business world. Particularly in the field of banking and other financial activities, data processing equipment with character recognition units are employed in order to facilitate processing of transfer instructions, pre-printed checks and other vouchers without manual assistance. Such recognition units are also employed in automatic letter sorting.
For such purposes one uses a specified set of identifiers, such as the alphanumeric set of characters including certain special characters. The recognition process is then used to allocate an identification giving maximum possible reliability to a character for recognition.
To do this, after detailed pre-processing of the subject data in an initial stage of a classifier for the item to be recognized, numerical evaluations known as credibility factors are estimated for one or several identifiers for consideration, which in a subsequent stage are employed as the basis for decision regarding allocation of the item to an identifier.
These allocation decisions are subject to residual faults, produced through rejects and acceptance of inaccurate identifiers which are often called substitutions. The desire for the minimum possible number of rejections and simultaneous minimum number of substitutions places contradictory requirements on the automated recognition process.
In order to improve reliability of pattern recognition, experiments have been made using what is known as the xe2x80x9cmulti-votingxe2x80x9d process. In this process, the reading results for the same character pattern from several pattern recognition units are further processed to what is known as a xe2x80x9cnarrowing unitxe2x80x9d, which compares the results and selects the overall result in accordance with the following rules:
a) If all the recognition units produce the same result, the overall result can be selected from any recognition unit desired;
b) If none of the recognition units is able to provide a reliable result, the overall result is a xe2x80x9crejectxe2x80x9d (not recognizable);
c) If the results of all the recognition units are the same, but among them there is at least one with a reliability factor which is greater than a previously defined threshold value (e.g. 50%), then the most reliable value is selected as the overall result;
d) If the results of all the recognition units are disparate, the overall result is a xe2x80x9crejectxe2x80x9d (not recognizable).
From DE 41 33 590 A1 we know of a process for classification of such signals, which in each case represents one of several possible amplitude values of the signals. In this process, the following working stages take place in parallel in one or more channels:
i) Samples are formed from several scanned values in each case;
ii) At least one characteristic is extracted from each sample;
iii) The characteristic or characteristics extracted from each sample are used as addresses, so that one can read out from a table, occurrence probabilities which are stored in it.
For further evaluation, from the occurrence probabilities of all samples a decision dimension is calculated and compared with a prescribed threshold value.
DE 21 12 919 B2 sets out a further arrangement for recognition of characters patterns using the multi-voting process. This arrangement contains an initial character pattern processing path, which has a collection array, which takes information from the characters, a processing array, which receives the signals from the collection array for processing and a decision array, which receives the signals from the processing array and in which a character is recognized, causing a decision signal to be displayed at its outlet. In addition, a second character processing path is used, which consists on the one hand of a further collection array and/or a further processing array and the decision array already present, or on the other hand, in each case of a second collection, processing and decision array. At least one of the three arrays of the second character processing path works in accordance with a different principle from the corresponding array of the first character processing path, and at the outlet of the second character processing path a decision signal together with a decision signal from the first character processing path is fed into a comparison array, which produces a recognition signal if the two match.
The multi-voting process further requires the use of at least two recognition units, each employing a different recognition algorithm. Experience shows, however that this arrangement only slightly improves the reliability of the recognition process compared with previous processes.
From DE 44 07 998 C2 we know of a process for recognition of a pattern on a voucher, where at least two different pattern recognition units are used to recognize the pattern. Means are also provided to determine a credibility factor represented by an un-sharp variable for each pattern recognized by the pattern recognition units, as well as means to evaluate the patterns recognized with the aid of the specified credibility factors.
DE 44 36 408 C1 describes a pattern recognition process, in which in a training phase a calibration specification is produced for the valuations for possible identifiers proposed by a classifier, and in the recognition procedure the valuations estimated by the classifier are replaced by different values using the calibration specification.
Finally, JP-A-8235304 describes a character recognition apparatus with a first and a second recognition unit, in which the second recognition unit has a supplementary dictionary as well as a monitoring unit. The extracted characteristic is fed to the dictionary in accordance with a correction character prescribed by the user, and character recognition is performed again. Once the second character recognition is completed, the additional dictionary is initialized.
In most cases present day pattern recognition systems contain, in addition to appliances for optical picture recording and the actual recognition units, a correction station, which for postal applications may be a video coding system, on which rejects are displayed and are manually corrected by people specially trained to do so. The corrected data, i.e. the video pictures of the unrecognized or inadequately recognized characters (non-coded information, NCI), together with the relevant manually input correct characters (coded information, CI) are in this case not further used for recognition purposes, but only for correction purposes.
In the development of a pattern recognition system, large quantities of character patterns are collected in a training database. They must be representative of the proposed recognition task. In the next stage, the characters are digitalized using an optical scanner (NCI data) and allocated entirely to the desired character categories, e.g. letters, numbers or even special characters. This section is also called the training process. The quality of character recognition depends to a large extent on this training database. If the data are
not representative
incomplete in terms of character form categories
out of line with reality in the distribution of character form
obsolete, as regards new fonts or trendy handwriting
obtained from a different optical recording system,
this produces unsatisfactory recognition results, although the training process has been performed correctly.
The training processes are expensive, since a number of steps are necessary to undertake them, e.g.
collection of patterns
assessment of their representativeness
scanning in of patterns
manual allocation of characters to a character category
testing of the new classifiers.
Consequently, these training processes are performed as seldom as possible. It is expected that there will be about one year or more between two versions of recognition software. Also the collection of data necessary is a problem, as genuine data is often confidential or personal, as in the case of personal letters.
These and other disadvantages of the known prior art are overcome by the instant invention which provides a complementary recognition system which is linked with the primary recognition system. An image that can not be positively recognized by the primary recognition system is passed on to the complementary recognition system and any characters not positively recognized by the complementary recognition are again passed on to a correction system. At the correction system, an operator classifies unrecognized characters which are then used to teach the complementary recognition system. Thus, the classified data of the correction system provide the training data for a continuous training process which is coupled with the correction system by a pattern adaptation system.
Consequently, one advantage of the present invention is a process for recognition of characters which enables an improved rate of recognition to be achieved.
A further advantage of the invention is a recognition process enabling the relevant recognition systems to continuously learn the unique patterns used by the current user without the need for special versions of recognition software to be created for each user.
These and other advantages of the invention which will become apparent to the reader of this document are achieved by the apparatus described in the following specification of preferred embodiments of the invention and claimed in the following claims.