1. Field of the Invention
The present invention relates to the field of pattern recognition, in particular to the recognition of a provided pattern as a valid pattern in a predefined language.
2. Description of the Related Art
Pattern recognition is a field where patterns, typically a visual expression such as a mathematical expression, or a flow diagram, or an organization chart are identified as valid. Typically, an image of a pattern will be provided by optically scanning a medium to create a bit mapped image of the medium. Once the bit mapped image is created, individual symbols or characters are identified through a process known as character recognition. Such character recognition techniques include matching of a bit mapped representation of the character with templates of known characters, analysis of polygon representations of the character and statistical analysis of the bit mapped representation of the character.
Pattern recognition is an important component for an emerging technology for computer systems known as pen based systems (or notepad computers). In a pen based system, a user inputs data to the computer by "writing " onto an input pad. This is contrasted with traditional methods of entering data on a keyboard or moving a cursor on a display to invoke some action. As the data being input may be anything written onto the input pad, for any particular application it must be determined that the data being input is of the proper type and syntax appropriate for the particular application.
It is known that the symbols or characters comprising a class of patterns (language) have relationships that can be defined in terms of a grammar (also known as syntactic methods). A grammar is a set of syntactic rules and symbols which define a language (or domain). Different grammars may have different types of rules and/or symbols. The language may be a spoken language, e.g. English or French, or it could be a computer language, e.g. Fortran or PASCAL. The syntactic rules define a method by which an expression may be identified as a valid or invalid expression in a language. In Computer Science, grammars have also been used to describe the logic of computing. Basic courses in automata theory use grammars to define finite and non-finite computing models.
With respect to pattern recognition, it is desirable to use syntactic methods when it is convenient to represent a pattern as a collection of one or more subpatterns. A subpattern is defined as a symbol or a group of related symbols. For a written language a symbol may be a letter of an alphabet or a mathematical operator or a graphical symbol. Syntactic methods are also desirable when the validity of a pattern depends on the relationships of its subpatterns. For example, for recognizing the validity of a mathematical equation, it may not be valid to have two mathematical operators being adjacent, even through on their own, each operator is a valid symbol in the grammar.
While a grammar defines the valid symbols and syntactic rules, a particular pattern is analyzed using a parsing process. Conventional methods of parsing are linear, i.e. top-down or bottom-up. A parsing process will determine whether a pattern is valid or invalid. The parsing process will determine validity via a lexical analysis (i.e. checking the validity of the individual subpatterns) and by determining that the syntactic rules are followed. If the pattern is valid, the parsing process will return a parsed representation of the pattern according to the syntactic rules. If the pattern is invalid, the parsing processing may terminate and provide information as to why the pattern is invalid.
As noted above, in the art of pattern recognition, syntactic methods have been utilized. One such syntactic method is known as an attribute grammar. Attribute grammars are discussed in an article entitled "Attributed Grammar--A Tool for Combining Syntactic and Statistical Approaches to Pattern Recognition", Wen-Hsiang Tsai and King-Su Fu, published in IEEE Transactions on Systems, Man, and Cybernetics, pgs. 873-885, Vol. SMC-10, No. 12, December 1980. In an attribute grammar, semantic information of the patterns is combined with the syntactic rules to create a production rule. The syntactic rules establish a relationship among subpatterns. The semantic information is used to compute attributes of a pattern using the attributes of the related subpatterns. The attributes of a related subpattern may also be used to indicate the applicability of a production rule.
An attribute grammar is defined as a five tuple EQU G=(V.sub.T,V.sub.N,A,P,S)
where V.sub.T is a set of terminal symbols, V.sub.N is a set of nonterminal symbols, A is a set of attributes, P is a set of production rules, and S is the start symbol. A symbol is merely a representation of a pattern or subpattern within the grammar. A terminal symbol represents subpatterns that cannot be further divided (e.g. a letter in an alphabet), whereas a nonterminal symbol represents a subpattern that may be further divided. For each X .epsilon. V.sub.T .orgate.V.sub.N, the expression A (x) denotes the attribute values of x, although some of the attribute values may be undefined for a given symbol. Each production rule in P has two parts, the first part of the rule specifies a syntactic restriction among the symbols and the second part of the rule specifies a semantic restriction among the symbols. The syntactic part is of context-free form. The semantic part describes how the attribute values of the left-hand side symbol of the syntactic rule are computed in terms of the attribute values of the symbols on the right-hand-side. Alternatively, the semantic part can indicate under which conditions the syntactic rule applies. Formally, the syntactic part of a rule is: EQU B.fwdarw.B.sub.1 B.sub.2 . . . B.sub.n
where EQU B.sub.i .epsilon.V.sub.N .orgate.V.sub.T, for 1.ltoreq.i.ltoreq.n.
The semantic part of the rule is a set of mappings. There are as many mappings as the number of attributes for the nonterminal B. Each mapping computes the corresponding attribute value of B from the attribute values of B.sub.1 B.sub.2 . . . B.sub.n.
Known attribute grammars are limited to defining one-dimensional relationships between subpatterns. It has been recognized that for some applications it is desirable to define multi-dimensional relationships between the subpatterns. This may occur for example in the analysis of mathematical expressions or fractions. In a fraction, one integer value is above a fraction line while a second integer value is below the fraction line. The same can be true for sub-expressions of a mathematical expression. Here, the relationships between the symbols are both horizontal and vertical (i.e. two-dimensional).
A method for describing and analyzing patterns with 2-D relationships is described in a paper entitled "Syntax-Directed Recognition of Hand-Printed Two-Dimensional Mathematics", Robert H. Anderson, Interactive Systems for Experimental Applied Mathematics, pgs. 436-459, New York, Academic Press, 1981. The Anderson paper describes syntax rules for driving a parsing process. The syntax rules have corresponding conditions for positively identifying particular input. However, the method described requires extensive computing resources to perform a pattern recognition analysis. This is because the associated parsing requires the examination of an overly broad set of permutations of the subpatterns which comprise the pattern. In particular, the described method does not provide for the use of keywords or heuristic information within the parsing process.
Therefore, it is an object of the present invention to provide a pattern recognition method where patterns having multi-dimensional relationships are defined and recognized with accuracy and computational efficiency.