Formal grammars have been used in the past in linguistics and computer science, where, for example, basic parsing algorithms have been developed. By formal grammars, it is meant the set of rules which define the allowable formats (syntax) that statements in a language in question are allowed to take. Grammers may be characterized as unrestricted, context-sensitive, context-free, and regular. The formal definitions of these four classifications are discussed in the detailed description of the invention. By parsing, it is meant the partitioning of a grammatical statement into its constituent parts to aid the automatic interpretation of a statement by a computer or other means. For example, the Cocke-Younger-Kasami parsing algorithm, described in INTRODUCTION TO AUTOMATA THEORY, LANGUAGES, AND COMPUTATION by Hopcroft and Ullman; Addison-Wesley, 1979; has become a standard parsing tool. The notion of grammars with attributes is developed in a paper SEMANTICS OF CONTEXT-FREE LANGUAGES by Knuth; Journal of Mathematical System Theory; 2:127-146; 1968. High dimensional grammars, of which two-dimensional grammars is a special case, have been studied for use in pattern recognition in general. An example of such a study is SYNTACTIC PATTERN RECOGNITION AND APPLICATIONS; K. S. Fu; Prentice-Hall; 1982 . Stochastic regular grammars as used in speech recognition may be attributed to J. K. Baker; THE DRAGON SYSTEM; IEEE TRANSACTIONS ON ACOUSTIC SPEECH AND SIGNAL PROCESSING; ASSP 23:24-29; 1975 and STOCHASTIC MODELING AS A MEANS OF AUTOMATIC SPEECH RECOGNITION; PhD Thesis; Carnegie-Mellon University; 1975. Stochastic, as used here, refers to the use of probabilities associated with the possible parsing of a statement to deal with real world situations characterized by noise, distortion and uncertainty. Stochastic context-free grammars for speech recognition have also been proposed by Baker in his paper TRAINABLE GRAMMARS FOR SPEECH RECOGNITION, Speech Communication Papers of the 97th Meeting of ASA, MIT, Cambridge, Mass., Jun. 12-16, 1979 though in practice little has been done in this area.
Some of the principles developed in the pattern recognition field have been suggested for printed object recognition. For example, the use of two-dimensional nonstochastic context-free grammars for recognizing printed equations is suggested by R. H. Anderson in chapter 7 of K. S. Fu, ed. SYNTACTIC PATTERN RECOGNITION; pp 147-177; Springer-Verlag; Berlin, 1977. W. W. Stallings has also suggested in Chapter 5 of the same book the principle of using similar two-dimensional techniques for the recognition of printed Chinese characters.
Commerical optical character recognition systems are also available for scanning documents containing graphics characters and for converting the scanned objects into the undelying text. Such systems, as far as is known, are primarily intended to produce a string of text without any understanding of the meaning of the text. Instead, such known systems appear to be arranged to select the most probable character or object, on an object-by-object basis. It is believed that some of these systems use stochastic principles in some fashion to assist the selection of the most likely objects. However, the systems do not distinguish titles, abstracts, footnotes, body text etc. in a document. If intelligent recognizers can be made that are able to distinguish the underlying structures of documents, then much more intelligent object recognition systems will become possible. Such an improved system would be able to parse the underlying graphical material and then, for example, search the material on a context basis. For example, an intelligent recognizer might parse a document from an image of the document, locate an abstract of the document and search the abstract for a given phrase. As further examples, graphical equations and tables could be parsed and approriate text automatically generated for subsequent formatting and printing in other documents under the control of document formatting systems such as TeX and NROFF.