1. Field of the Invention
This invention relates to the field of pattern recognition, and specifically to the recognition of mathematical expressions.
2. Description of Related Art
Prior art recognizers, such as speech recognizers or recognizers for handwritten text, have profitably used the linguistic structure of the language they attempt to recognize. The linguistic information consists in the first place of a list of the known words, so that the recognizer does not report nonsense words. After this basic step, the most common formulation of linguistic knowledge for natural language recognizers is N-gram models, which specify probabilities for words, or word pairs, word triples, and so on. These are based on the observation that certain word sequences are more likely than others; for example, having recognized “Humpty” the system can make a good guess about the next word. N-gram models can also be applied at the character level for handwriting. There are other types of linguistic models which attempt to capture some of the structure of the language, for example with the use of context free grammars.
In using linguistic information, the natural language recognizer decides between a number of possibilities which get relatively good scores based on the speech signal, handwriting signal or other input. The recognizer can find the probability assigned to each such possibility by the linguistic model, and boost those probabilities that have high linguistic scores. This then enables the recognizer to distinguish between phrases like “I must put on a coat” and “I must put on a goat”.
Mathematical expression recognition is the process of finding a mathematical expression described by ambiguous signals. It has applications in a number of areas. For example, U.S. Pat. Nos. 5,428,805 and 5,544,262 describe a calculator on which the user can write a mathematical expression; the calculator then attempts to recognize the expression, and once recognized computes and displays an answer. In this case the ambiguous signals are the user's pen strokes.
The publications, Fateman and Tokuyasu, “Progress in recognizing typeset mathematics,” Proc. SPIE 2660, pp. 37-50, 1996 and Fateman, Tokuyasu, Berman and Mitchell, “Optical character recognition and parsing of typeset mathematics,” J. Visual Commun. Image Represent. 7, pp. 2-15, 1996, describe a system which scans pages of old technical journals and attempts to recognize the equations as well as the text. In this case the ambiguous signals are the bit mapped images of the equations.
The input data for a mathematical recognizer, such as a handwritten or scanned mathematical expression, is ambiguous in many ways, and in order for the output of the recognizer to be useful to the ultimate application this ambiguity must be reduced or eliminated. For example, an application that wants to perform a calculation must know the symbols exactly, and also know the mathematical structure of the expression. A text formatting application primarily needs the identity and position of the symbols, but even in this case it must understand at least something of the semantics of the expression—“sin” will typically be formatted differently depending on whether it is a trig function or a product of three variables.
Whatever the application, ambiguity is a constant companion in recognizing mathematical notation. There is ambiguity in the characters—is that ink stroke a 2 or a z? There is ambiguity in the placement of the characters. There is also syntactic ambiguity—is f(x) a function application or a multiplication? Because of the ambiguity involved with mathematical notation, a recognizer of mathematical expressions can rarely report a single clear answer; rather it must choose a best among a number of possible answers based on whatever information it has available. Thus, there is a long standing need to improve the accuracy of mathematical recognition.