1. Field of Invention
This invention relates generally to an apparatus for recognizing handwritten text and alphanumeric symbols, and more particularly to a method and system for recognizing handwritten text and alphanumeric symbols that includes a pen and digitizing tablet for real time entry of handwritten alphanumeric symbols by a user and in certain implementations to a system that includes a document scanner for generating scanned images of a previously created document containing handwritten alphanumeric symbols.
2. Description of the Related Technology
Computer vision encompasses a wide range of markets, applications, and customer needs. According to industry analysts, market demand is gravitating towards a system line whose production process is re-examined to achieve high cost-efficiency.
The ability to recognize handwritten text and alphanumeric symbols is very important in many applications, such as pen-based computer systems, automated mail routing systems, bank check recognition, and automatic data and text entry from business forms. Handwriting recognizers transform text in bit map representation to a high level (i.e., ASCII alphanumeric) coded representation. Pen-based computer systems translate pen motions generated by a user into a sequence of X and Y points indicating the locations of the pen on the tablet. In offline handwriting recognition systems, text on a printed surface such as a sheet of paper are typically scanned by an optical scanner which creates a bit map of the pixels (or points) belonging to the image. The recognized alphanumeric symbols may be used for analysis, editing, or other forms of processing via an application software running on a computer.
Computer-aided handwriting recognition is a technology that is continually evolving. A variety of writing styles, combined with poor penmanship continues to stymie researchers' attempts to design a robust system that can decode all forms of handwriting. Currently, texts produced by state-of-the-art handwriting recognizers contain an unacceptable frequency of errors. This prevents the technology from being efficiently used for large-volume information transfer. Today's most advanced commercial systems are at best at reading legible handwriting letters and numbers in predefined form. The reported accuracy results can be only achieved with careful writing by cooperative users.
The rapid and robust identification of alphanumeric symbols that lack standardized characteristics constitutes a development challenge for handwriting recognition systems. More particularly, shape, size, spacing and orientation of alphanumeric symbols vary widely from user-to-user, thus resulting in a distinct alphanumeric symbol that exhibit similar shapes. For example, “g” and “9” or “D” and “O” may appear with similar shapes. This problem is compounded when an alphanumeric symbol is grouped together in a sequence to form a new alphanumeric symbol. For example, a “1”-shaped number followed relatively closely by a “3”-shaped number may be identified as “B”.
There are many proposed methods for handwriting recognition known in the prior art. Rejean Plamondon et. al. present a comprehensive survey of on-line and off-line handwriting recognition. The majority of these techniques that have been developed for handwriting recognition can be broadly classified as the statistical, structural and the neural network approaches as described below:
The statistical approach is based on a similarity measure that in turn is expressed in terms of a distance measure or a discriminant function involving the following three groups: explicit, implicit, and Markov modeling methods. In this context, a shape is described by a fixed amount of features defining a multi-dimensional representation space whereby different classes are described with multi-dimensional probability distributions concerning a class centroid. Several examples of the discriminant functions include linear discriminant and polynomial functions, minimum distance, nearest neighbor, and Bayes classifier. A problem associated with this approach is that discriminant function can be quite complex and may involve adjustments to the parameters under a learning scheme. Another problem identified with the statistical approach is that relationships between pattern elements are not preserved.
The fuzzy set theory has played an important role in both statistical and syntactical approaches. In the neural net approach, the amount of built-in prior knowledge of the alphanumeric recognition problem may seriously affect its generalization performance. An advantage of the neural nets is that they provide the degree of membership of the unknown object in each of the known classes. Moreover, they avoid a long and costly conventional development process.
In the structural approaches, the premise of the recognition process was primarily based on the idea that alphanumeric shape can be described in an abstract fashion. However, syntactical and structural approaches overcome the problem of preserving relationships by storing the image as a tree or graph of pattern elements and their relationships. A difficulty in implementation of these approaches is defining the pattern elements or features, and the relationships between them. In addition, each class or types of images should be separately analyzed and described.
Neural network models use a weight matrix to store information gained from the representation of known images. Ideally, as more instances and types of images are added, the system should have an improvement in performance. However, the performance of the neural nets could deteriorate after certain level of learning.
Other methods include, (i) global features (i.e., template matching, transformations), (ii) distribution of points (i.e., zoning, moments, distances), and (iii) geometrical and features. However, each of these techniques has its own drawback, as global features are highly sensitive to distortion and style variation, distribution of points are highly affected by the dynamic size and shape variations of hand printed characters, and geometrical and features are complex and sensitive to local features.
These techniques described above are narrowly focused on a particular type of recognition approach and more importantly, do not conform to the mechanisms underlying alphanumeric formation. Furthermore, these methods have not solved the signal-to-symbol transition problem and thus rely on computations that occur on information derived from images containing low semantic level, unable to contain the variability problem. Moreover, one of the tenets of vision is that choice of representation is crucial in recognition. Representations must be chosen that make relevant information explicit and allow domain constraints to emerge. In the techniques adopted, very little use is made of a priori information in images. Finally, the complexity of the task due to intrinsic and extrinsic variations present in the image, with regards to the development time-line as well as the inherent ill-defined concepts which in turn yield invalid assessments that have not been dealt with.