1. Field of the Invention
The present invention relates to a character recognition machine for separating every character pattern from a character string image contained in a text image and for recognizing these character patterns. The invention also relates to a character recognition machine utilizing language processing.
2. Related Art of the Invention
FIG. 5 is a diagram showing the structure of the prior art character recognition machine. This machine comprises a character string or array image storage portion 1, a projection profile (or concentration) histogram calculating portion 3, a character interval-estimating portion 101, a Gaussian filter portion 102, a threshold value-processing portion 103, an interval memory 104, a character starting point-detecting portion 105, a character end point-detecting portion 106, a character pattern output portion 10, a feature-extracting portion 107, a reference character feature dictionary 108, a matching portion 109, a word dictionary 110, and a word matching portion 111.
The operation of this prior art character recognition machine is now described by referring to FIG. 5. The character string image storage portion 1 stores a character string image read by the machine. The projection profile histogram calculating portion 3 counts the number of black pixels located on lines extending perpendicular against the direction of the input character string in the input character string image stored in the character string image storage portion 1 and sends values obtained at various coordinates located on lines extending in the direction of the character string to the character interval-estimating portion, or character squareness degree-estimating portion, 101 and to the Gaussian filter portion 102 in such a way that the values form a projection profile histogram. The character interval-estimating portion 101 finds the average value of those values contained in the projection profile histogram which lie within a given range, based on the maximum value, and sends this average value as a character squareness degree to the character end point-detecting portion 106.
The Gaussian filter portion 102 convolves a Gaussian function into the input projection profile histogram to thereby suppress RF components of the histogram and accentuate valley portions of the histogram. The output from the Gaussian filter 102 is applied to the threshold value-processing portion 103, which then finds the starting point and the end point of an interval whose input value exceeds a given value and delivers the found values to the interval memory 104. Plural sets of the starting points and the end points calculated by the threshold value-processing portion 103 are successively written to the interval memory 104 and stored there.
The prior art machine determines character positions successively from intervals held in the interval memory 104. The determination of the character positions is described below.
First, the character starting point-detecting portion 105 reads the interval closest to the origin of the coordinate system from the interval memory 104, takes the starting point of this interval as the starting point of the first character, and delivers this starting point to both character end point-detecting portion 106 and character pattern output portion 10. Let Xstart be the applied starting point. Let L be the interval between characters obtained by the character interval-estimating portion 101. Then, the character end point-detecting portion 106 takes the end point Xend of that interval of the intervals held in the interval memory 104 which satisfies conditions (1) as the end point of the first character, and sends this point to both character pattern output portion 10 and character starting point-detecting portion 105. EQU (1-.alpha.).multidot.L&lt;.vertline.Xend-Xstart.vertline.&lt;(1+.alpha.).multidot .L (1)
where .alpha. is an appropriate positive number.
The character end point-detecting portion 106 continues its detecting operation while changing the value of .alpha. until the end point is detected.
When the end point of the first character is applied from the character end point-detecting portion 106, the character starting point-detecting portion 105 reads out a starting point which has a value greater than this end point and closest to the origin of the coordinate systems. In the same way as in the case of the first character, the character starting point-detecting portion 105 takes this starting point as the starting point of the second character and sends this starting point to both character end point-detecting portion 106 and character pattern output portion 10. In this manner, the positions of characters are determined successively in the order of the end point of the second character, the starting point of the third character, and the end point of the third character until all the intervals held in the interval memory 104 correspond to the character positions. Information about the character positions is delivered to the character pattern output portion 10.
The character pattern output portion 10 reads an image from the input character string image held in the character string image storage portion 1, the image lying within the range from the starting point to the end point of each input character. The output portion 10 successively sends such images as character patterns to the feature-extracting portion 107.
The feature-extracting portion 107 normalizes the sizes of character patterns applied from the character pattern output portion 10, extracts features (e.g., the densities of black pixels, the densities of horizontal, vertical and oblique components, contours, and the number of end points and intersections) of the characters contained in a certain number of block regions when the normalized sizes are divided into the block regions, and sends the features to the matching portion 109.
The above-described features of the characters which are found from reference characters have been previously registered in the reference character feature dictionary 108.
The matching portion 109 finds the degrees of approximation of the character features applied from the feature-extracting portion 107 to the reference characters registered in the reference character feature dictionary 108, and calculates these degrees of approximation as character recognition evaluation values. Whenever a character feature is applied, a reference character is judged to be closest to the character, based on one character recognition evaluation value. A certain number of such reference characters are sent as a candidate character category to the word matching portion 111 together with their character recognition evaluation values. Words have been previously registered in the word dictionary 110.
The word matching portion 111 finds combinations of candidate character categories applied from the matching portion 109, evaluates the matching to the words registered in the word dictionary 110, searches for a combination of the highest matching, and produces the character category contained in this combination as the result of final recognition.
In the configuration described above, it is necessary to execute the process step by step. The processing time is increased accordingly. Also, it is generally impossible to evaluate the results of extractions of characters. Once such an extraction produces an error, the results of subsequent extractions are affected greatly. Thus, the processing accuracy deteriorates. Furthermore, those portions which have been incorrectly extracted do not lead to correct recognition results. In addition, a character which is once recognized erroneously will continue to be treated erroneously.
A conventional character recognition pattern machine first roughly classifies character patterns applied, selects a character category group (i.e., a set of character patterns having similar feature vectors) to which the input character patterns belong, then subclassifies the patterns within the selected character category group to recognize the character patterns. This machine is described in, for example, the Proceedings of the Electronic Information and Communications Society of Japan, D-II, Vol. J75-D-II, No. 3, pp. 545-553, "Large-Scale Neural Net", Comb NET-II.
FIG. 6 shows the structure of this known character recognition machine. This machine has an image input portion 6-110 such as an image scanner which receives a text image to be recognized. A character-extracting portion 6-111 extracts character patterns from each character region of the text image received by the image input portion 6-110. A feature-extracting portion 6-112 extracts feature vectors from each character pattern extracted by the character-extracting portion 6-111, the feature vectors being used to discern character categories. A rough classification portion 6-113 roughly classifies the character patterns into character category groups, using the feature vectors extracted by the feature-extracting portion 6-112. A subclassification portion 6-114 subclassifies the character patterns within each character category group, using the feature vectors. A group-selecting portion 6-115 selects plural character category groups from the output (hereinafter referred to as the goodness of fit) from the rough classification portion 6-113. A subclassification portion input signal selector portion 6-116 selects such a subclassification portion 6-114 which receives the feature vectors, based on information about selection of group, the information being obtained by the group-selecting portion 6-115. A discerning portion 6-117 discerns the character patterns from the goodness of fit of the character category groups selected by the group-selecting portion 6-115 and from the output value from the subclassification portion 6-114.
The rough classification portion 6-113 comprises input portions 6-118 receiving the feature vectors of the character patterns extracted by the feature-extracting portion 6-112 and multiple inputs and one output (hereinafter this is abridged as only multiple input-output) signal processing portions 6-119 for calculating the goodness of fit of each character category group to the character patterns.
Each subclassification portion 6-114 comprises input portions 6-120 receiving the feature vectors delivered from the subclassification portion input signal selector portion 6-116, multiple input-output signal processing portions 6-121, and input portions 6-120 forming a lower layer, the input portions 6-120 being connected with the multiple input-output signal processing portions 6-121. The subclassification portions 6-114 calculate the products of the outputs from the input portions 6-120 or the multiple input-output signal processing portions 6-121 and weighting coefficients, calculate the sum of these products, and deliver the sum if it is less than a threshold value. The weighting coefficients indicate degrees of connectivity. These input-output signal processing portions 6-121 form a multilayered structure, and there exists no connection inside each layer. The network is so connected that signals are propagated only to upper layers. Thus, the degree of similarity of each character category inside the character category group to the character patterns are found. A maximum value selecting portion 6-122 selects the greatest value from the output values from the multiple input-output signal processing portions in the top layer.
The discerning portion 6-117 comprises similarity degree calculating portions 6-123 and a category discerning portion 6-124. The similarity degree calculating portions 6-123 calculate the degrees of similarity of character categories from the goodness of fit of the character category group selected by the group-selecting portion 6-115 and from the output value from the subclassification portion 6-114 which corresponds to the character category group. The category discerning portion 6-124 finds the maximum value of the degrees of similarity of character categories obtained by the similarity degree calculating portions 6-123 to discern the character category of the applied character pattern.
The operation of the known character recognition machine constructed as described above is described now. The character-extracting portion 6-111 extracts character patterns one by one from the text image applied from the image input portion 6-110. The feature-extracting portion 6-112 finds a feature vector X about the character pattern extracted from the character-extracting portion 6-111. The vector X is composed of n feature data items and given by EQU X=(x.sub.1, x.sub.2, . . . , x.sub.n) (1A)
The feature data items are found by the concentration mesh method. In this method, an applied character pattern is divided into n small regions. The area (i.e., the number of black pixels contained in each small region) of the character portion in each small region is normalized with the area of the small region. The normalized number is taken as data about a feature.
The feature vector X extracted by the feature-extracting portion 6-112 in this way is applied to the input portions 6-118 of the rough classification portion 6-113. The number of the input portions 6-118 is n, or equal to the number of feature data items of a character pattern. The feature data item x.sub.i are applied to the respective input portions 6-118. The multiple input-output signal processing portions 6-119 of the rough classification portion 6-113 calculate the total sum of the products of inputs x.sub.j to the input portions 6-118 connected with the processing portions 6-119 and their respective weighting coefficients v.sub.ij (1.ltoreq.i.ltoreq.m.sub.r ; m.sub.r is the number of character category groups; 1.ltoreq.j.ltoreq.n) that indicate their degrees of connectivity. The weighting coefficient vector of each input-output processing portion 6-119 is given by EQU V.sub.i =(v.sub.i1, v.sub.i2, . . . , v.sub.in) (2)
Then, the rough classification portion 6-113 divides the total sum by the product of the norms .vertline.X.vertline..multidot..vertline.V.sub.i .vertline. of the feature vector X and the weighting coefficient vector V.sub.i, and delivers the quotient. That is, the output value sim (X, V.sub.i) from the multiple input-output signal processing portion 6-119 having the weighting coefficient vector V.sub.i shown in FIG. 6 can be given by EQU sim(X, V.sub.i)=(X.multidot.V.sub.i)/(.vertline.X.vertline. .vertline.V.sub.i .vertline.)
where ##EQU1##
The weighting coefficient vectors V.sub.i have been previously designed so that some input-output signal processing portion 6-119 produces its maximum output in response to a set of character patterns having similar feature vectors X.
These weighting coefficient vectors V.sub.i are designed by the prior art techniques as follows. In the first step, whenever the feature vector X of a character pattern for designing the weighting coefficient vectors is applied, V.sub.c having the greatest value of sim (X, V.sub.i) is found (at this time, it is said that X is optimally matched to V.sub.c), and V.sub.c is made to approach X. When the number of character patterns optimally matched to one weighting coefficient vector exceeds a given value, the region assigned to this vector is divided into two. Thus, an additional weighting coefficient vector is created. In the second step, V.sub.i optimally matched to all character patterns for designing weighting coefficient vectors are found. A check is done to determine whether these values of V.sub.i differ from previous values. If they differ, V.sub.i are modified. At this time, weighting coefficient vectors are generated, in the same way as in the first step. These operations are repeated until neither modification nor creation of weighting coefficient vectors takes place.
By designing weighting coefficient vectors in this way, each weighting coefficient vector V.sub.i can divide and quantize the feature vector space of a character pattern. That is, applied character patterns are classified into sets of character patterns having similar feature vectors, i.e., into character category groups, in terms of the weighting coefficient vectors V.sub.i. The output value from each input-output signal processing portion 6-119 is produced as the goodness of fit of each character category group to character patterns to the group-selecting portion 6-115.
The group-selecting portion 6-115 selects an arbitrary number of character category groups in order of increasing goodness of fit obtained by the rough classification portion 6-113, and produces information indicating which character category groups are selected and corresponding goodness of fit.
The subclassification portion input signal selector portion 6-116 selects some subclassification portions 6-114 according to the information about the selected groups, the information being obtained from the group-selecting portion 6-115. These subclassification portions 6-114 receive the feature vector X of the applied character pattern. X is produced to these subclassification portions 6-114.
The subclassification portions 6-114 which correspond to the character category groups selected by the group-selecting portion 6-115 and receive the feature vectors X of the character patterns from the subclassification portion input signal selector portion 6-116 receive the feature vectors X at their input portions 6-120. The number of the input portions 6-120 is n, or equal to the number of feature data items of each character pattern. The feature data items x.sub.i are applied to their respective input portions 6-120. Each multiple input-output signal processing portion 6-121 of the subclassification portions 6-114 calculate the products of the outputs from the input portions 6-120 in the lower layer connected with the processing portions 6-121 or the multiple input-output signal processing portions 6-121 and weighting coefficients, calculates the total sum of these products, transforms the sum into a corresponding value by a threshold value function, and produces the resulting value to the upper layer. The weighting coefficients indicate the degrees of connectivity. The multiple input-output signal processing portions 6-121 in the top layer of each subclassification portion 6-114 are equal in number with character categories of character patterns contained in each category group. The multiple input-output signal processing portions 6-121 in the top layer correspond to these character categories. The maximum value selecting portion 6-122 selects a maximum one from output values from the multiple input-output signal processing portions 6-121 in the top layer and delivers the character categories corresponding to these multiple input-output signal processing portions 6-121, as well as the maximum output value.
Weighting functions of the multiple input-output signal processing portions 6-121 have been previously set in such a way that the multiple input-output signal processing portions 6-121 in the top layer corresponding to character categories produce a maximum output in response to the feature vectors X of the character patterns having character categories within the character category group. This is known as a learning method for weighting coefficients.
More specifically, such a learning method for weighting coefficients is carried out by a learning algorithm known as the error back propagation method. This error back propagation method is described, for example, by D. E. Rumelhart, G. E. Hinton, and R. J. Williams, in "Learning Representations by Back-Propagating Errors", Nature, Vol. 323, pp. 533-536, Oct. 9, 1986.
The error back propagation method is hereinafter described briefly. First, feature vectors X of character patterns for learning of weighting coefficients are applied to the input portions 6-120 of the subclassification portions 6-114. As described previously, each multiple input-output signal processing portion 6-121 of the subclassification portions 6-114 calculates the products of the outputs from the input portions 6-120 in the lower layer connected with the processing portions 6-121 or the multiple input-output signal processing portions 6-121 and weighting coefficients, calculates the total sum of these products, transforms the sum into a corresponding value by a threshold value function, and produces the resulting value to the upper layer. The weighting coefficients indicate the degrees of connectivity. Error E that is a deviation of output o.sub.k from all the multiple input-output signal processing portions 6-121 in the top layer from a desirable output t.sub.k (referred to as a teacher signal) is given by EQU E=0.5.SIGMA..sub.p .SIGMA..sub.k (t.sub.k -o.sub.k).sup.2 (4)
where .SIGMA..sub.p is the sum of teacher signals associated with the number of character patterns. The purpose of the learning is to determine such a weighting coefficient value which minimizes the error E. The deviation .DELTA.w.sub.ij of the weighting coefficient of each multiple input-output signal processing portion 6-121 is calculated according to Eq. (5) given by EQU .DELTA.w.sub.ij =-.epsilon..GAMMA.E/.GAMMA.w.sub.ij (5)
where .epsilon. is a positive constant called a learning rate. This modification of the weighting coefficient according to Eq. (5) is repeated whenever the feature vector X of a character pattern for learning is applied. In this way, the error E can be reduced. If the error E becomes sufficiently small, then the learning is ended because we can consider that the output signal has sufficiently approached the desired value.
This method of learning weighting coefficients permits the multiple input-output signal processing portions 6-121 in the top layer corresponding to the character patterns possessed by character categories in a character category group to produce their maximum outputs. Accordingly, the maximum value selecting portion 6-122 selects that of the multiple input-output signal processing portions 6-121 in the top layer which produces a maximum output. In this way, within each character category, i.e., within each subclassification portion, the character category of the applied character pattern can be judged.
In the discerning portion 6-117, the similarity degree calculating portions 6-123 first calculate the degree of similarity of each character category obtained by the subclassification portions 6-114, from the goodness of fit of the character category group selected by the group-selecting portion 6-115 and from the output value from the subclassification portion 6-114 corresponding to that character category group, using Eq. (6). These degrees of similarity are output to the category discerning portion 6-124. EQU degree of similarity=(goodness of fit).sup.a (output value).sup.b(6)
where a and b are real constants.
Finally, the category discerning portion 6-124 compares the degrees of similarity of character categories obtained from the similarity degree calculating portions 6-123, and produces the character category corresponding to the greatest degree of similarity as the result of discerning.
In character recognition, features which are effective in discerning a character pattern are extracted as feature vectors. Generally, it is difficult to sufficiently recognize characters if only one feature vector is used. Characters can be detected well by using plural kinds of feature vectors. In particular, a character pattern which would lead to obscure recognition or erroneous recognition with a single feature vector may be discerned correctly if a different feature vector is added. At this time, the recognition performance might be improved by attaching great importance to the vector that enables accurate discrimination.
However, if the conventional character recognition machine which hierarchically recognizes character patterns having numerous character categories is equipped with only one feature-extracting portion or recognition portion, and if plural kinds of feature vectors are employed, then it is necessary to apply these vectors to the machine simultaneously for recognition. In this case, indeed, discriminating performance somewhat higher than in the case where a single feature vector is used can be realized but the advantages obtained by using different feature vectors as described above cannot be fully exploited. That is, when plural kinds of feature vectors are used simultaneously, it is possible to enhance the discriminating performance. However, it is very difficult to correctly recognize a character pattern which would be erroneously recognized with a certain feature vector as described above, based on the results of discrimination using a different feature vector.
Where plural kinds of feature vectors are collectively used simultaneously, as the number of dimensions of feature vectors is increased, a longer time is required to recognize characters.
In recent years, as database technologies have evolved, the demand for character recognition machines capable of recognizing characters at a high speed and at a high recognition rate has increased.
Conventionally, knowledge processing has been introduced to character recognition machines to enhance the recognition accuracy. This knowledge processing is a method of modifying the result of a recognition of each one character to the most probable character by using a word dictionary and a grammatical dictionary for the result described above.
Another known character recognition machine is disclosed in Japanese Patent Laid-Open No. 164887/1991. This known machine is shown in FIG. 7 and has an extraction portion 7-11 for extracting an image representing one character from a text image. A character recognition portion 7-12 produces candidate characters (N candidate characters per character) in response to the extracted image. A clause (or phrase) search portion 7-13 finds combinations of characters forming a clause from the set of the candidate characters, using a word dictionary 7-16 and a grammatical dictionary 7-17. A clause evaluating value calculating portion 7-14 finds a clause-evaluating value indicating the vocabular and grammatical correctness of each candidate clause. A candidate clause (or phrase) comparator portion 7-31 compares the clause-evaluating values of the candidate clauses. If plural candidate clauses having a maximum clause-evaluating value exist, then a candidate character string comparator portion 7-32 compares the candidate clause character strings for each individual character. If a disagreement between characters is discovered in a character position, then these characters are rejected. A signal indicating the position of the erroneously recognized character is produced. In this way, the recognition can be improved, and the modification operation can be made more efficient.
In the above-described character recognition machine, when a plurality of candidate clauses exist, the characters in these clauses are rejected. Therefore, if this machine is used for general documents containing numerous words, almost all characters are rejected. Where the character recognition portion recognizes handwritten characters, the recognition rate is low. At this time, the number of candidate characters delivered from the character recognition portion is increased. This also increases the number of candidate clauses, thus increasing the possibility of the presence of plural candidate clauses. Hence, excessive characters are rejected.
A further character recognition machine has been proposed in Japanese Patent Laid-Open No. 214990/1990. As shown in FIG. 8, this machine has a character amendment portion 8-8 which receives N candidate characters per character from a character recognition portion 8-1. An automatic amendment portion 8-61 compares the candidate characters with an amendment rule table 8-63 and amends characters according to amendment rules. The results of the amendments produced from the automatic amendment portion 8-61 are displayed for a human operator. Then, he or she amends characters erroneously recognized. A manual amendment control portion 8-62 creates rules of amendments, registers the rules in the table 8-63, applies the rules to the results of subsequent recognitions to automatically amend incorrect recognitions. In this way, characters can be recognized according to the fonts of the document and according to the amendments made by the operator.
The character recognition machine described immediately above creates rules of amendments according to amendments made by the operator and so it is not possible for the machine to automatically create rules of amendments.