Character recognition is often used in the course of inputting information which is in a humanly readably form (i.e., machine printed or handwritten form) rather than in electronic digital form. For instance, while many computers have a keyboard for receiving keypunch input, other computer systems have an optical scanner for receiving documentary input. Yet other computer systems have a pen-like stylus and tablet digitizer for receiving handwritten input. Such handwriting input devices may be provided for a number of reasons. For example, many users are more accustomed to inputting data via handwriting as opposed to keypunch. Additionally, a keyboard requires a large amount of space which cannot be accommodated in a small portable computer such as a personal digital assistant or PDA.
FIG. 1 shows a conventional character recognition system 10. The character recognition system may include a tablet and stylus 18, an optical scanner 16, or both. In the case of the tablet and stylus 18, the user moves the stylus about the tablet surface. The tablet and stylus 18 convert the user's movement of the stylus with respect to the tablet into digital binary data which graphically represents the movement. That is, if the user had used a pen and paper, the movement would have created marks on the paper. The stylus and table 18 produce pixellated images of such marks in the form of digital data.
In the case of an optical scanner 16, sheets, on which handwritten or machine printed characters are previously formed, are fed into the optical scanner 16. The optical scanner 16 generates digital binary data which graphically represent the characters on the sheets.
The stylus and tablet 18 or optical scanner 16 transfer the data to an I/O interface 14. The I/O interface 14, in turn, transfers the data onto a bus 12 of the system 10. The character recognition system 10 also includes a processor or CPU 20, a main memory 22, a disk memory 24 and an audio/video output device 26. Each of the devices 20, 22, 24 and 26 is connected to the bus 12 for purposes of transferring data to, and receiving data from, one of the other devices or the I/O interface 14. The audio/video output device 26 is for conveying information to a user in the form of images and sounds. To that end, the audio/video output device 26 may include a cathode ray tube or LCD display and loudspeakers. The main memory 22 and disk memory 24 are for storing data and programs. The processor 20 is for processing data. In particular, the processor 20 executes steps in conjunction with the other devices 12, 14, 16, 18, 22, 24 and 26 for recognizing characters from the inputted data.
FIG. 2 illustrates a conventional handwriting recognition process which may be executed by the character recognition system 10 of FIG. 1. In a first step 32, the inputted handwritten or machine printed characters are received. For instance, using the stylus and tablet 18, the user manually writes one or more characters. The stylus and tablet 18 transfer character data which graphically represents the written characters to the I/O interface 14. Alternatively, the user feeds sheets on which handwritten or machine printed characters have been previously handwritten or printed into the optical scanner 16. The optical scanner 16, in turn, transfers character data which graphically represents the handwritten or machine printed characters to the I/O interface 14. The I/O interface 14 transfers the character data via the system bus 12 to, for instance, the main memory 22.
Next in step 34, the processor 20 pre-processes the inputted character data stored in the main memory 22. For instance, the processor 20 may remove noise by discarding clusters of connected filled pixels having less than a minimum threshold area. The processor 20 may also smooth the graphical images of the inputted characters. Next, in step 36, the processor 20 optionally forms a skeleton image of each inputted character and then converts the skeleton images to enlarged contour images (i.e., thickens the lines of the skeleton images). Then, in step 38, the processor 20 segments the images of the characters (i.e., divides the images into sub-images or zones) for purposes of extracting feature values from the character images. Herein, "feature" means any quantifiable graphical characteristic of an image which is useful for distinguishing the image of one or more characters from others. An illustrative segmentation technique is described in U.S. patent application Ser. No. 08/313,686 wherein the segmentation depends on the feature values to be extracted from the inputted characters. For instance, suppose the inputted character is the handwritten number "8". The graphical image of the inputted character "8" may be segmented as shown in FIGS. 3, 4 and 5. In FIG. 3, eight zones 321, 322, 323, 324, 341, 342, 343 and 344 are formed as shown. In FIG. 4, eight zones 352, 354, 356, 358, 372, 374, 376 and 378 are formed as shown. In FIG. 5, four zones 332, 334, 336 and 338 are formed as shown.
Next, in step 40 (FIG. 2), the processor 20 extracts a vector of feature values for each inputted character. U.S. patent application Ser. No. 08/313,686 provides examples of features which may be extracted from characters segmented as shown in FIGS. 3-5. These illustrative features are briefly described below. Illustratively, feature value vectors are extracted for the same features for each character.
Referring to FIG. 6, the extraction of stroke density function (SDF) feature values is illustrated. In evaluating the SDF, the processor 20 projects a number of inspection lines in each zone in which the SDF is evaluated. The processor 20 then counts the number of times the graphical image of the character crosses an inspection line within the zone. The total number of crossings is divided by the total number of inspection lines to produce the result of the SDF function (which, in turn, is the SDF feature value). Illustratively, the processor 20 does not evaluate the SDF in every zone. Rather, the processor 20 illustratively evaluates the SDF in the eight vertical zones 321, 322, 326, 327, 341, 342, 346 and 347 and in four horizontal zones 332, 334, 336, and 338 to produce 12 feature values.
Referring to FIG. 7, the extraction of peripheral background area (PBA) feature values is described. In determining the PBA, the processor 20 evaluates the following function: ##EQU1## where n' is an index of points on either the horizontal (x') or vertical (y') axis which successively takes on each value from 1 to the maximum dimension N' of the character image rectangle on that axis. .lambda.'.sub.n' is the distance in pixels from the n'.sup.th location to a filled pixel of the character image. As shown in FIG. 7, the processor 20 measures .lambda.'.sub.n' perpendicularly from the corresponding axis. The variable m' takes on a value which indicates the particular zone for which the PBA function is evaluated. X'.sub.m' represents the horizontal width of the m'.sup.th zone and Y'.sub.m' represents the vertical height of the m'.sup.th zone.
The processor 20 evaluates the PBA in the vertical direction for each of the zones 321, 322, 326, 327, 341, 342, 346 and 347. The PBA is evaluated in the horizontal direction for the zones 352, 354, 356, 358, 372, 374, 376 and 378. Thus, 16 feature values are extracted.
Referring to FIG. 8, the extraction of the contour line length (CLL) feature values is described. In determining the CLL feature values, the processor 20 evaluates the following formula: ##EQU2## The variables m', n', X'.sub.m', Y'.sub.' and .lambda.'.sub.n' are as described above. The processor 20 obtains two CLL feature values, namely CLL.sub.1 and CLL.sub.2, for both of the vertical and horizontal zones 324, 328, 344 and 348. This produces 16 feature values.
Referring to FIG. 9, gradient feature values are extracted. First, the processor 20 assigns a direction code Dir.sub.i',j' to each pixel of the character image in the i'.sup.th column and j'.sup.th row of the character image. The variables i' and j' are indexes in the horizontal (x') and vertical (y') directions, respectively. The direction code corresponds to a direction that is normal to a tangent line at the pixel. As shown in FIG. 9, there are eight possible direction codes which can be assigned, each corresponding to a 45.degree. angle direction, i.e., 0 for 337.5.degree. to 22.5.degree., 1 for 22.5.degree. to 67.5.degree., 2 for 67.5.degree. to 112.5.degree., 3 for 112.5.degree. to 157.5.degree., 4 for 157.5.degree. to 202.5.degree., 5 for 202.5.degree. to 247.5.degree., 6for 247.5.degree. to 292.5.degree. and 7 for 292.5.degree. to 337.5.degree.. Thereafter, the processor 20 generates a vector of lengths Len(Dir.sub.i',j') in each zone using the following formulas: ##EQU3## where Bdd.sub.m' (Dir.sub.i',j') represents the boundary width of the m'.sup.th zone which direction is normal to (Dir.sub.i',j'), and wherein X' and Y' are values generated using the following kernels: ##EQU4## The processor 20 applies the kernels to each pixel of the character image prior to determining the length Len in the appropriate direction Dir.sub.i',j'. The lengths Len.sub.i',j' are then combined as per equation (3c) to produce a single value for each of the eight directions Dir.sub.i',j'. Thus, the processor 20 generates eight gradient feature values for each zone. Illustratively, the gradient feature values are extracted from each of the eight zones 352, 354, 356, 358, 372, 374, 376 and 378 thereby generating sixty-four feature values.
After extracting a feature value vector for an inputted character, the processor 20 executes step 42 (FIG. 2). In step 42, the processor 20 compares the feature value vector of each inputted character to feature value vectors contained in a database of predetermined feature value vectors. Illustratively, this database may be stored in the disk memory 24 or the main memory 22. The database contains at least one predetermined feature value vector for each model character of a set of model characters that can be recognized by the system 10. For instance, suppose the system 10 can recognize the letters of the English alphabet. In such a case, at least one predetermined feature value vector is maintained in the database for each letter of the alphabet. Based on these comparisons, the processor 20 determines the predetermined feature value vector which best matches the feature value vector of the inputted character. In step 44 (FIG. 2), the processor 20 outputs the model character to which the best matching predetermined feature value vector corresponds. For instance, the processor 20 can output the ASCII code of the model character, a predetermined character image of the model character, etc.
Many prior art modifications and enhancements have been proposed for character recognition. See, U.S. Pat. Nos. 5,151,950, 5,050,219, 5,034,989, 4,903,312, 4,731,857, 4,718,103, 4,685,142, 4,284,975, and 4,773,099, and D. Lee & N. Srihari, Handprinted Digital Recognition: A Comparison of Algorithms THIRD INTERNATIONAL WORKSHOP ON FRONTIERS IN HANDWRITING RECOGNITION p. 153-162 (1993), G. Srikantan, Gradient Representation for Handwritten Character Recognition THIRD INTERNATIONAL WORKSHOP ON FRONTIERS IN HANDWRITING RECOGNITION p. 318-23 (1993) and L. Tu, W. Lin, Y. Chan & I. Shyu, A PC Based Handwritten Chinese Character Recognition System THIRD INTERNATIONAL WORKSHOP ON FRONTIERS IN HANDWRITING RECOGNITION p. 349-54 (1993).
As noted above, a typical character recognition system 10 compares the feature values extracted from inputted characters against a predetermined database of feature value vectors of model characters. Such a database may be organized in a number of ways. For instance, U.S. Pat. No. 5,050,219 (Maury) teaches a character recognition database organized according to a tree structure. Each leaf node of the tree contains a character which can be recognized. Each non-leaf node of the tree contains a particular one of a plurality of predetermined feature comparisons which should be performed on the inputted character feature values. Based on the results of the comparison at such a non-leaf node, the database is traversed to a particular attached child node. In the comparison step, the tree is traversed until a leaf node is reached. The character is then recognized as the character corresponding to the leaf node.
Other character recognition databases are flat. Such character recognition databases contain at least one vector of feature values for each model character to be recognized. The inputted character feature values are compared to each vector of feature values. The inputted character is then recognized as the model character corresponding to the vector of feature values which best match the feature value vector of the inputted character.
A flat character recognition database such as used above is conventionally generated as follows. Multiple training character samples are inputted to the system 10 of FIG. 1 for each model character which can be recognized. Feature values are then extracted for each training character sample. Typically, this results in too many feature value vectors to be practically stored or accessed for purposes of making comparisons. Therefore, the feature value vector database is compacted. To that end, the processor 20 illustratively organizes the characters into classes. For instance, the processor 20 initially forms one or more classes for each model character, and places each training character sample of each model character into a respective class. Thereafter, the processor 20 assigns one or more prototype feature value vectors for representing each class. For instance, the processor may form the mean feature value vector in each class, and assign the respective mean feature value vector to the class as a prototype feature value vector. This prototype feature value vector is said to represent a prototype, or virtual representative, character of the class. (U.S. patent application Ser. No. 08/313,686 proposes an alternative scheme for classifying and selecting prototype feature value vectors for each class.)
In U.S. Pat. No. 4,773,099, Bokser discloses a method for organizing a recognition data base into so-called "ringed clusters". These ringed clusters include "certainty spheres" for character identification with certainty, "confidence spheres" for character identification without certainty but with some confidence level, and "possibility spheres" for classification of unknown characters.
Bokser further delineates the ringed clusters into "coarse", "medium", and "fine" categories, depending on the desired degree of accuracy in classifying the input characters.
Regarding the above described prior art in general, the basis for accepting or rejecting unknown input data by comparison with a feature value vector type of data base may be summarized as follows:
Let x denote the feature value vector of an unknown input pattern/character PA1 Let r.sub.k denote the feature value vector of a prototype of class k PA1 Let M denote the nearest class to the feature value vector x, i.e., the class having a prototype feature value vector nearest to the feature value vector x. PA1 Let S denote the second nearest class to the feature value vector x PA1 Let CR denote a class region threshold for precise recognition PA1 Let DA denote a Dis-Ambiguity threshold for decisive classification PA1 Let D denote a distance function, where the minimum distance criterion is expressed as ##EQU5## Then, for pattern/character recognition, ##EQU6## and, for pattern/character ambiguity, ##EQU7## PA1 r.sub.M denotes the matching prototype feature value vector of the nearest class M, PA1 r.sub.k denotes a prototype feature value vector of a class k, PA1 r.sub.S denotes the matching prototype feature value vector of the second nearest class S, PA1 CR.sub.M denotes the class region threshold for the nearest class M, and PA1 DA.sub.M denotes the dis-ambiguity threshold for the nearest class M.
Ideally, a recognition system is expected to be able to detect both ambiguous and non-character patterns. Accordingly, a criterion for rejecting both of them is a combined equation: ##EQU8## Briefly stated, equation (7) states that M, the nearest class to x, is the class of the input pattern with feature value vector x, provided that the distance from M to x is no more than the threshold CR and that the distance from S, the second nearest class, to x is at least the threshold DA more than the distance from M to x.
There is one major drawback in equation (7). In particular, equation (7) uses only one DA and one CR, so that the two thresholds are not sensitive to different classes. This results in either too much rejection or too much misrecognition. For example, if DA is small, then ambiguous samples may be misrecognized. On the other hand, if DA is large, samples tend to be rejected even when they are not ambiguous. For another example, if CR is small, a few samples belonging to a scattered class will be rejected. But if CR is large, non-character patterns close to a compact class will be recognized to this class erroneously.
FIG. 10 shows a pattern distribution of four classes in feature value space. In FIG. 10, each member of a class k (k .epsilon.{A, B, C, D}) is denoted by symbol k. Classes A and B are neighboring to each other, and class A is much more scattered than class B. Classes C and D overlap each other such that samples of the two classes are impossible to be separated fully. On the assumption that all patterns of both classes A and B are recognized correctly under a current distance measurement, some members of either class C or class D will be misrecognized if DA is small. However, if DA is large, a few patterns belonging to either class A or class B will be unnecessarily rejected. Considering non-character patterns, if CR is small, a few samples of class A will be rejected. However, if CR is large, some non-character patterns close to class B will tend to be misrecognized to class B.
It is an object of the present invention to overcome the disadvantages of the prior art.