1. Field of the Invention
This invention relates to text recognition of text images using a top-down analysis of an image and possibly corresponding text words. In particular, this invention relates to a piece-wise left-to-right analysis of an image on a line-by-line basis and the possibly corresponding text prefixes and words. The analysis uses a left-to-right piece-wise concatenation of the individual image segments and of the corresponding upper and lower contours of selected text prefixes and/or text words. The possibly corresponding text prefixes and words are selected based on previous comparisons between the text image and the upper and lower image contours of the previous possible text prefixes.
2. Related Art
Traditionally, text recognition techniques, such as those used in optical character recognition, have proceeded from a bottom-up orientation. That is, traditional techniques first identify individual pixels, then join these pixels into connected components or strokes. Then, the connected components or strokes are mapped onto characters. The higher level units such as words then appear merely as a sequence of previously recognized characters. However, these traditional techniques invariably experience well-known inaccuracies and inefficiencies due to the difficulties of isolating and identifying strokes or connected components and the ambiguous way in which the ambiguously determined strokes or connected components map onto the character sequences.
While the accuracy of these traditional techniques can be improved by filtering the intermediate results of the recognition process against a dictionary of known words, this improved accuracy is achieved only at a further increase in the inefficiency of the recognition process, in terms of time and processing power required.
In contrast to the traditional bottom-up text recognition techniques, a partially top-down technique has recently been proposed. In this partially top-down technique, the recognition process proceeds from an analysis of the overall outline of an individual word. In this approach, the interword spacing is used to first distinguish between individual words. The shape class of a word, as defined by its relative length, its distribution of ascenders and descenders and its distribution of angular versus rounded contours, is analyzed and determined, rather than the details of the particular letter strokes or connected components which form the word. In this way, the text recognition process avoids having to analyze the details of the strokes or connected components, which are often extremely noisy and difficult to correctly isolate.
This known technique is a partially top-down technique because it continues to use standard bottom-up processes such as locating interword spaces. By including such bottom-up processes, this known technique retains the inefficiencies in bottom-up techniques involved in identifying such interword spaces. That is, the noise associated with identifying components or strokes can also affect the process of identifying interword spaces, which may vary in size. In addition, a small interword space may be recognized as an intraword space, while a large intraword space may be recognized as an interword space. Thus, by trying to determine the interword spaces in a bottom-up, a priori manner, the known inefficiencies and inaccuracies in bottom-up techniques are reintroduced in this known, partially top-down technique.
In this partially top-down technique, the shape contour of each of a dictionary of words is matched against a shape contour generated from a bitmap image of the text word to be recognized. In this known approach, the recognition process matches the generated image word shape separately against a text word shape for each text word contained in the dictionary. Unfortunately this itself introduces several typical inefficiencies into the recognition process.
First, the amount of computation is linear, relative to the size of the dictionary. Thus, the computational cost is prohibitive for any reasonably-sized dictionary (for example 100,000 words).
Second, the amount of memory storage necessary to store the word contours corresponding to each of the words in the dictionary for even a reasonably-sized dictionary is itself extremely large.
Finally, because the shape of each text word as a whole is compared against the shape of each image word as a whole, image words having shapes which deviate in systematic ways from the model shapes of the text words in the dictionary, such as stretching, shrinking or tilting, significantly interferes with the reliability of the matching process. Accordingly, while this partially top-down approach provides a more robust text recognition process relative to the known problems of noisy and difficult to isolate strokes or connected components, the top-down approach itself has areas lacking robustness. Further, while the partially top-down approach is able to avoid the inefficiencies arising from analyzing strokes or connected components, it magnifies the inefficiencies arising out of the use of dictionaries of known words. Finally, the memory requirements of this known partially top-down approach raise the cost of a top-down system to prohibitive levels.