Most of known optical character recognition (OCR) methods start with a pre segmentation stage in which part of a digitized document is first segmented (isolated) into individual symbols, words and/or characters, followed by a character recognition step to translate these symbols, words and/or characters into pre-determined computer-readable entities.
A template-based Optical Character Recognition (OCR) method which does not require pre-segmentation has been suggested by Levin et al. in the International Application No. WO93/18483 published on Sep. 16, 1993, and entitled “Method and Apparatus for Image Recognition”. A drawback of this method is that it has difficulty recognizing patterns because of intra-class variability of the patterns.
As it is well-known in the art, each character which needs to be recognized is considered to be a different class.
The recognition of a character includes the characterization of their features or patterns. While there are generally different views on the definition of the features of patterns, many studies made on the recognition of characters as well as on the recognition of patterns have shown that the so-called quasi-topological features of a character or pattern such as the concavity, loop, and connectivity are key features for the recognition. To date, many different methods have been proposed for the purpose of extracting such features. For example, some of these methods use analysis of the progressive slopes of the black pixels.
On-line handwriting recognition systems have been designed which compute feature vectors as functions of time. An example of such systems is described in T. Starner, J. Makhoul, R. Schwartz and G. Chou; “On-Line Cursive Handwriting Recognition Using Speech Recognition Methods; IEEE International Conference on Acoustics, Speech, and Signal Processing, Adelaide, Australia, Apr. 19-22, 1994, Vol. V. pp. 125-128. However, on-line handwriting recognitions systems are not suitable for OCR applications since these applications are faced with the problem of recognizing a whole page of text which presents a two-dimensional problem for which there is no obvious way of defining a feature vector as a function of one independent variable.
U.S. Pat. No. 5,727,130, issued to Hung on Mar. 10, 1998 and entitled “Genetic Algorithm For Constructing And Tuning Logic System” describes the use of a fuzzy logic system for OCR. “Fuzzy Logic” was developed to enable data processors based on binary logic to provide an answer between “yes” and “no.” Fuzzy logic is a logic system which has membership functions with fuzzy boundaries. Membership functions translate subjective expressions, such as “temperature is warm,” into a value which typical data processors can recognize. A label such as “warm” is used to identify a range of input values whose boundaries are not points at which the label is true on one side and false on the other side. Rather, in a system which implements fuzzy logic, the boundaries of the membership functions gradually change and may overlap a boundary of an adjacent membership set. Therefore, a degree of membership is typically assigned to an input value. For example, given two membership functions over a range of temperatures, an input temperature may fall in the overlapping areas of both the functions labelled “cool” and “warm.” Further processing would then be required to determine a degree of membership in each of the membership functions.
Fuzzy logic control systems have become increasingly popular in practical applications. Traditionally, the design of the knowledge base including membership functions and rules relies on a subjective human “rule-of-thumb” approach for decision-making. In addition, the control system is adapted (tuned) to the desired performance through trial and error. As a result, designing and adapting the fuzzy logic control system becomes a time-consuming task. To overcome this drawback, neural network techniques have been used in assisting designers to generate rules and adapt the fuzzy logic control system automatically.
A fuzzy logic system is inherently well-suited for dealing with imprecise data such as handwritten character and processing rules in parallel. However, the actual implementation of fuzzy rule-based systems for this type of application often relies on a substantial amount of heuristic observation to express the knowledge of the system. In addition, it is not easy to design an optimal fuzzy system to capture the necessary features of each character.
Typically, one rule is used to recognize one character, and each character is represented as one consequent of a rule. The actual implementation of fuzzy rule-based systems for this type of application often relies on a substantial amount of heuristic observation to express the membership functions for the antecedents of each rule. Each rule consists of several antecedents and consequents depending on the number of inputs and outputs, respectfully. Each antecedent in a given rule is defined as an input membership function, and each consequent is defined as an output membership function.
Neural networks consist of highly interconnected processing units that can learn and globally estimate input-output functions in a parallel-distribution framework. Fuzzy logic system store and process rules that output fuzzy sets associated with input fuzzy sets in parallel. The similar parallelism properties of neural nets and fuzzy logic systems have lead to their integration in studies of the behaviour of highly complex systems.
The process of designing a fuzzy rule-based system is tedious and critical for the success of the recognition. It must be done as efficiently and accurately as possible if it is to sufficiently address the OCR problem.
However, the output of Neural networks is dependent on the exact sequence of <<learning>> of the knowledge base. If the same knowledge base is fed twice to a neural Network with only one substitution in the learning sequence, the end result will be different in each case. This can be a major disadvantage for any OCR system.
In the U.S. Pat. No. 5,727,130, Hung describes the use of Learning Vector Quantization (“LVQ”). LVQ, which is well-known in the art, accomplishes learning by placing input data in a finite number of known classes. The result is that this method provides the supervised effect of learning and enhances the classification accuracy of input patterns. It is also independent of the learning sequence.
It is desirable to design more robust input membership functions that correspond to a rule. The linguistic term of a rule's antecedent, such as “input 1 is small”, depends upon how accurately the input space is qualified while defining membership functions. LVQ can group similar input data into the same class by adjusting the connection weights between the inputs and their corresponding output. In other words, through supervised learning, the features of each class can be extracted from its associated inputs.
Hence, a learning vector quantization neural network may be used to optimize the features of each handwritten character. Ming-Kuei Hu, in “Visual Pattern Recognition Moment Invariant,” IEEE Transaction on Information Theory, pp. 179-186, 1962, describes such a system. A LVQ network, is also disclosed in Teuvo Kohonen, “The Self-Organizing Map,” Proceeding of the IEEE, Vol. 78, No. 9, pp. 1364-1479, September 1990.
A LVQ learning system can be seen as a two-layered network. The first layer is the input layer; the second is the competitive layer, which is organized as a two-dimensional grid. All units (a “unit” is represented as one input variable, such as x1, of one input pattern (x1, x2, . . . )) from the first layer to the second are fully interconnected. In the OCR example, the units of the second layer are grouped into classes, each of which pertains to one character. For purposes of training, an input pattern consists of the values of each input variable and its corresponding class (i.e. the character that it represents). A quantization unit in the competitive layer has an associated vector comprising the values of each interconnection from all the units in the input layer to itself. This vector implicitly defines an ideal form of character within a given class.
The LVQ learning system determines the class borders using a nearest-neighbour method. This method computes the smallest distance between the input vector X: (x1, x2, . . . xn) and each quantization vector. In known systems, this computation is done in terms of Euclidean distance (straight line distance in multi-dimensional space).
Input vector X belongs to class C(x), and quantization vector w(I) belongs to class C(w). If C(x) and C(w) belong to different classes, the w(I) is pulled away from the class border to increase the classification accuracy. If C(x) and C(w) have the same class, the w(I) closes to the center of the class. Then each input pattern is presented sequentially in the input layer and several iterations. The weights of the quantization units in each class are fine-tuned to group around the center of the class. Therefore, the weight vector of the center unit within the class is represented as the optimum classification for the corresponding class. The result of the LVQ learning process is an optimized vector for each alphanumeric character.
U.S. Pat. No. 5,832,474, entitled “Document Search And Retrieval System With Partial Match Searching Of User-Drawn Annotations” and issued to Lopresti et al. on Nov. 3, 1998 also describes the use of vector quantization in a document search and retrieval system that does not require the recognition of individual characters.
However, most of the prior art character recognition systems are based on the concept of seeking to classify the greatest number of characters as possible. This means that such systems seek to attribute each character to be recognized to a class even if a certain degree of “guesswork” is necessary. As a result, such systems are far from being sufficiently accurate for many applications.
A specific example of a LVQ learning system is the optimal linear separation. It can be described summarily as follows:                each class vector has important dimensions (from 100 to 350 components);        for each pair of classes it is possible to find an hyper plan allowing to separate them. In the case of N classes, they are separated two by two by N (N−1)/2 hyperplans.        The equation of each hyperplan is simple:S(αi xi)=0.        
Therefore, for all members of class A, S(αi xi)>0 and for all members of class B, S(αi xi)<0. By the use of a simple algorithm, the various coefficients αi converge toward the most efficient value. This known system can be useful when used with characters which are very close to those in the database. This is the case, for example, of typed characters.
However, it has drawbacks, the most important of which is the difficulty to find a hyperplan to separate very complex objects such as hand-printed characters. Because a plan is by definition open ended, it is difficult to reject characters which are relatively distanced from the characters which are sought to be read (commas, exclamation marks, question marks, etc . . . ).
Multi-layer perceptron is a well known application of neural networks. This method can be excellent if great care is used in the training phase. However, because no theoretical base exist to improve the result in a structured way, one must rely on trial and error processes which are extremely costly. As a result, if the multi-layer perceptron system is “taught” the same data twice, two different results will be obtained.
Very often, it is impossible to be 100% certain that the character which was read is in reality the digital representation which was assigned to such character by the recognition method. Therefore, it is advantageous to establish a measure of confidence in the accuracy of the recognition method. The confusion rate is defined as the number of characters which were thought to have been recognized but were in fact wrongly recognized divided by the total number of characters read. The rejection rate is the number of characters which the recognition method has failed to recognize over the total number of characters read. The read rate consists in the total number of characters that were accurately read over the total number of characters read. Therefore, the read rate plus the confusion rate plus the rejection rate should equal 100%.
In many applications, it is preferable to consider a character to be unrecognizable even if it is one of the ASCII characters that the system is seeking to recognize than to assign a wrong ASCII value to the character which was read. This is especially the case in financial applications.
This being said, the read rate has to be high enough for the recognition system to be worthwhile. Therefore, the ideal system is the one in which the confusion rate is zero and the read rate is as close as possible to perfect. Limiting factors for the read rate include:                poor quality images including poor contrast images caused by the use of a low quality writing instrument or a color that is not easy to digitize;        an image that is poorly recognized by the digitizing sub-system or the presence of a background image;        a poor definition of the zone in which the characters are to be written; and        printed characters that extend outside the area reserved for the field which can include characters which are too large, character that are patched together or open characters.        
Poor class separation performance may also result from the quality or quantity of the vector examples in the vector database or the inability of the recognition engine to generalize. Indeed, hand-printed documents are by definition nearly never identical from one person to the next or from one expression to the next even when written by the same person.