A parsing device converts a linear symbol sequence into an organized structure. The diagrammed sentence encountered in grade-school grammar is one kind of parse. This organized structure can then be displayed for instructional purposes, or other devices can use it to initiate commands, store information, or make logical inferences. A parsing device is a prerequisite for using natural language to communicate with computers and, moreover, a simple parsing device is needed if natural language is to be used for controlling household devices such as kitchen appliances, videocassette recorders, lighting, and heating or air conditioning. The central problem in parsing is to achieve both accuracy and speed. For synthetic languages, such as computer languages, many effective solutions are available. However, the complexity and ambiguity of natural languages have made them resistant to efficient parsing.
Automated sentence parsing devices--usually general purpose digital computers--have operated by matching parts of the input sentence with a very large number of stored rules (Tennant et al., 1989, U.S. Pat. No. 4,829,423). If the first rule does not apply, the next is chosen for examination. Three broad classes of devices have been used; these employ increasingly deep levels of sentence analysis. The most common approach is Syntactic Parsing, in which the rules driving the parsing device describe the organization of words in a particular language. In practice, achieving a parse also requires resolving the possible senses of ambiguous words. This is done by a semantic interpreter, which uses information about word meaning. Semantic Parsing aims at the level of meaning directly, driving the parser with rules relating ordered words to the underlying meaning of the sentence. Conceptual Analysis also aims directly for the level of meaning, but does not attempt to preserve word order. Conceptual analysis parsers identify key words as tokens for the underlying objects or actions, and the organization sought is that of the objects being discussed rather than that of the words used to discuss them. The processor reconstructs this organization by using "encyclopedia information" about the objects, rather than "dictionary information" about the words.
These three approaches to sentence parsing will now be explained in more detail. In each approach, the stored rules can be either template-patterns, which must be matched exactly, or general rules, which are applied to parts of the sentence in succession to build a tree-like parsed structure of increasingly finer detail, from a sentence to its phrases to words. The successive-rule strategy, termed "generative," is combinatorial since rules can be used more than once in various combinations. Rules can be chosen so as to divide a sentence into smaller units ("top-down parsing") or, less frequently, to assemble words into higher-level units ("bottom-up" parsing). By storing the result of each successful pattern-matching or rule application, the parse accumulates aspects of the sentence's organization.
Syntactic Parsing
A Pattern-Matching parsing device accepts a sentence if each word matches part of a stored template of allowed words or parts of speech ("lexical categories," such as "verb"). To cover a language, a large number of template patterns are stored in advance and each compared to the input sentence. The result is a flat structure whose sentence organization is simply the before/after relation between words or lexical categories (Winograd, 1983, pp. 36-37, 46). More complex pattern-matching strategies build hierarchical structures using syntactic categories (e.g. "noun phrase"). These "head and modifier" structures include the classic Diagrammed Sentence, composed of a subject-verb pattern to which modifiers are attached, and Dependency Grammar, in which even the subject is represented as a modifier of the verb. However, the prior art does not emphasize identification of rules that would enable automated generation of such trees (Winograd, 1983, pp. 73-75). A practical pattern-matching device has been described that tags a sentence's words with symbols denoting the word's possible syntactic features, and then matches the pattern of tags to stored template-patterns of tags (Kucera and Carus, 1989, U.S. Pat. No. 4,864,502).
The Generative paradigm replaces the long list of template-patterns with a shorter list of rules involving mini-patterns; these rules are used combinatorially and repetitively. Flat structures are generated by Transition-Networks, in which rules about the succession of words or lexical categories are embodied as transitions allowed between states of the parser (Winograd, 1983, pp. 54-55). If the input word matches a lexical category allowed for the current parser state (e.g., "adjective"), the parser switches to the next state. There it has a new set of allowed categories and the next word must match one of these. The resulting parse is flat, indicating only which lexical categories follow each other in the sentence.
Hierarchical structures can be constructed generatively, however. Recursive Transition Networks allow transition networks to accomplish a hierarchical parse by allowing parser transitions to be triggered not only by lexical categories like "adjective," but also by syntactic categories such as "noun phrase" (Winograd, 1983, pp. 196-198). Deciding whether the input sentence contains a noun phrase, however, requires first activating additional transition networks for describing the constituents of noun phrases. Immediate-Constituent Grammar parsers build the tree-like structure by using a list of rules that relate two levels of structure (Winograd, 1983, pp. 75-76). The list includes rules linking sentences to noun-, verb-, and prepositional phrases; these to lexical categories such as noun, adjective, and determiner; and these to specific words. An example is: sentence.fwdarw.noun phrase+verb phrase; noun phrase.fwdarw.determiner+adjective+noun; verb phrase.fwdarw.verb+adverb+direct object. Successive application of the rules produces a tree of increasing detail: parsing a sentence is achieved by successively checking each rule in the list for fit to the input sequence of words, and choosing from the rule list those that match. One list of rules, one grammar, is capable of building many different trees. The particular tree that results depends on the subset of rules that matched the particular sentence's words.
In Role Structures parsers, the syntactic categories are themselves in effect constituents of a pattern of syntactic roles (such as the pattern "subject-object") (Winograd, 1983, pp. 79-80). Word tags can also be processed generatively (Hutchins, 1991, U.S. Pat. No. 4,994,966). A related strategy is to copy the dictionary entries of the words of a sentence directly into memory, and then use the grammar rules to directly remove properties from, or add to, these dictionary entries (Zamora et al., 1989, U.S. Pat. No. 4,887,212).
A disadvantage of the above devices is that they use context-free grammars and are thus inadequate for sentences that have long-distance dependencies. Such dependencies arise from number/case agreement, passive-voice constructions, and questions. Transformational Grammars use a series of context-sensitive rewrite rules to transform a context-free syntactic tree structure into another tree that matches the word order found in, for example, the passive voice (Winograd, 1983, pp. 139-143). However, these processes have not been widely incorporated into devices (Winograd, 1983, p. 162).
Phrase Structure Grammar uses immediate-constituent rules, but adds "derived constituents" of the form "verb phrase but missing x." For example, a derived constituent like VP/NP carries the information that a noun phrase is missing from the verb phrase. Long-distance dependencies are resolved by using semantic knowledge to find an NP that fills in this hole (Winograd, 1983, pp. 345-346). The strategy most frequently used in practical parsers is the Augmented Transition Network (Loatman et al., 1990, U.S. Pat. No. 4,914,590). Here, the recursive transition network's set of allowed transitions between parser states is supplemented with registers that record a word's features, such as number or gender, and a word's role, such as subject, object, or determiner. Rules governing the allowed transitions are made dependent on the condition of these registers (Winograd, 1983, pp. 204-210). In this way, long-distance dependencies are incorporated. Charts are kept of the intermediate states of the parse to improve efficiency.
A completely different generative approach is Tree-Adjoining Grammar, which combines not rules but trees (Joshi, 1985). Each initial tree, actually a "mini-tree," describes a sentence fragment. These are combined to build a tree for the complete sentence. By collecting several rules into one mini-tree, a long-distance dependency can be incorporated from the outset as a mini-tree having links between two of its branches. However, since each mini-tree represents just one of the combinations of several rules, the initial stock of mini-trees is very large. The number of tree-choice decisions is accordingly large and the parsing time is proportional to the fourth power of sentence length. A "molecular parsing" device has been described in which each word's dictionary entry contains both phrase structure rules and template-patterns for the word's possible successors (Hu, 1994, U.S. Pat. No. 5,297,040).
Implementing these syntactic parsing approaches has not been straightforward, for two reasons. First, a correct parse requires a single grammar rule to be chosen at each word. But the correctness of a rule is usually not evident until several words later. Therefore, the large number of rules required to capture real sentences means that the device will pursue many fruitless parsing paths, followed by backtracking when the parse becomes clearly incorrect. This rule-choice problem has fostered strategies to minimize the time spent on fruitless parsing paths, including pursuing multiple parses in parallel or retaining information about partially-successful parses (Winograd, 1983, pp. 368-369).
Some parsers prune the parallel parsing paths by using syntactic rules to assess the likelihood that the parse thus far is correct (van Vliembergen, 1991, U.S. Pat. No. 5,068,789). The Deterministic Parsing conjecture asserts that, for sentences that can be understood without difficulty, a parser need not backtrack or construct the alternative parses that arise from rule-order choice (Marcus, 1980, pp. 2-6). The parsing is deterministic in that, once built, nothing can be unbuilt or ignored. The deterministic parser achieves this result by viewing not just one word at a time, but a 3- to 5-constituent window constructed from the incoming sentence (Marcus, 1980, pp. 17-21). Incoming words and already-built constituents are stored in a first-in first-out buffer, the first 3-5 positions of which are available to the parser. New constituents are built from these in a separate push-down stack. The syntactic tree is built using syntactic pattern-matching rules and semantic case rules that involve both the stack and the buffer. Few reports on this method have appeared since the original description.
The second problem in syntactic parsing has been that words in natural languages often have multiple meanings; yet, the parsing strategies described above require that the correct meaning be identified before the parse can be completed. One strategy has again been to try each of the alternative possible parses, most again fruitless. Alternatively, parsers have reduced ambiguity by using co-occurrence relations--statistical information on the frequency with which words are found together (Kucera et al., 1989, U.S. Pat. No. 4,868,750).
Semantic Interpretation of a Syntactic Parse
More direct reduction of word ambiguity, and rule choice, has been sought by associating word-meaning information with each element of the syntactic parse tree after the syntactic tree is constructed. Semantic interpretation then usually uses a Pattern-Matching strategy against stored semantic template-patterns. These stored patterns can be of several types. Selection Restriction Rules specify the objects with which a given word may have a relation; these are embodied as arguments to the word's entry in the dictionary, or "lexicon." An example is: "green"+(physical object).fwdarw.green color (Allen, 1995, pp. 295-296). Rather than denoting a meaning, these rules lead to a choice among previously-cataloged meanings. Semantic Network patterns are hierarchies of object-types or action-types, based on semantic primitives, which the designer constructs in advance. If the semantic network also includes "semantic functions," such as "agent," "instrument," or "result," it can be used to limit the possible senses of a word (Allen, 1995, pp. 305-307). In actual devices, such hierarchies are implemented as a nested list.
Case Grammar is related more to the fixed patterns in which meanings tend to occur ("semantic case" patterns) than to the meanings themselves. Examples are agent, instrument, and theme. The semantic case of a word or phrase depends on its position with respect to the verb, just as the familiar syntactic cases like subject and object do. Since verbs are restricted in the semantic cases they allow, the parser can identify semantic case patterns using syntactic patterns found in the syntactic parse (Winograd, 1983, pp. 314-317). Alternatively, semantic case patterns can be identified independently of specific verbs by having the parser derive them from syntactic case patterns; for example, the identification of a direct object in the syntactic parse implies the presence of an agent and a theme in the semantic interpretation (Beardon et al., 1992, pp. 133-134).
Other strategies are less often used in practical parsing devices. Systemic Grammar uses hierarchies of template-patterns based not only on semantic functions, but also on word and pragmatic functions. The functions are arranged into systems organized as a tree of choices: mood (question, statement, command); transitivity (actor, action, goal; agent, instrument, result); theme (focus of attention); and information (already-given, new) (Allen, 1995, p. 95; Beardon, 1992, pp. 187-188). These patterns are reflected in the sentence's word order and lexical-category patterns. Thus, a degree of semantic information emerges once the correct lexical category pattern is matched.
Functional Grammar uses hierarchical template-patterns that combine, in a single notation, syntactic elements such as voice, syntactic categories such as "noun phrase," and semantic elements such as "actor." Such patterns are stored in advance for the various types of sentence fragments. It differs from systemic grammar in emphasizing matching the pattern, rather than deriving the hierarchical pattern from word patterns (Winograd, 1983, pp. 328-330). Parsing by any of these pattern-matching procedures assembles overlapping sentence-fragment template patterns. Such pattern-matching strategies have had limited application in practical devices, as they do not readily parse complex sentences or languages with large vocabularies.
Several parsing devices use Generative methods, building semantic structures that correspond to the syntactic parse tree. The semantic rules are incorporated within the syntactic rules that the syntactic parser uses. Of these, Definite Clause Grammar supplements the immediate-constituent grammar rules with further information about word features, possible word roles in the sentence, and semantic functions (Winograd, 1983, p. 349). These rules are expressed in predicate calculus; parsing then uses a theorem-prover algorithm.
Lexical-Functional Grammar supplements a syntactic tree with a semantics-like tree (Winograd, 1983, p. 334-337; Tokuume et al., 1992, U.S. Pat. No. 5,101,349). To achieve this, the designer first identifies rules governing the inheritance of word features, word roles, and semantic functions down a context-free constituent-structure tree. He assigns to each constituent in an immediate-constituent rule an equation that describes the relation between the features, roles, and functions of the parent node and those of the child node. Thus, the parser will build a feature/role/function tree as it builds the syntactic tree. In addition, the parser uses a detailed dictionary that includes feature/role information about each word and its possible arguments. This word information will then percolate up through the feature/role/function tree. The rules for the syntactic tree can be simple; though they would allow many incorrect sentences, the constraint resulting from solving the set of feature/role/function equations results in a correct parse. The parser solves this set of simultaneous equations instead of pursuing all of the parsing paths made possible by an ambiguous word.
The large number of rules in the various strategies described above has had the consequence that many natural language understanding systems are limited to a domain-specific knowledge base, such as a database, medical reports, or inventory control.
Two additional strategies for semantic interpretation do not seem to have led to parsing devices. Generative Semantics takes the point of view that syntax and semantics should have the same rules. The semantic structure is a logical proposition whose predicates and arguments are arranged as immediate-constituents forming a tree (Winograd, 1983, pp. 560-562). Semantic rules are of the form: proposition.fwdarw.predicate+arguments; argument.fwdarw.another proposition or an "index"; index.fwdarw.a proper noun or a phrase denoting a variable. The main concern of generative semantics has been generating the syntactic tree from the proposition, rather than from a natural language text.
Montague Semantics asserts that both the syntax and semantics of natural languages obey formal laws (Dowty, 1981, pp. 180-183). Each compositional syntactic rule is associated with a parallel compositional semantic rule. The syntactic analysis uses a phrase-structure grammar's derived categories (see above) to construct a syntactic tree, but with a very large hierarchy of syntactic constituent-types that corresponds to the hierarchy of logical types found in the predicate calculus of the semantics. In the semantics portion, the meaning of a word is defined as being the word's referent--specified by membership in sets of individuals or sets of properties. Montague semantics focusses on building precise specifications of set-membership. In principle, the syntactic analysis would automatically generate the semantic representation. But focussing on set membership does not address word ambiguity, and the approach appears to have been used only on small subsets of English.
All of the parsing methods described thus far are based on syntactic information: knowing which word types--nouns, verbs, adjectives, auxilliaries, singulars, plurals, nominative forms or objective, active voice constructions or passive, various tense forms--can follow each other in a sentence.
Semantic Parsing
Strategies have also been devised for parsing sentences directly into a semantic structure, without a syntactic tree. A Pattern-Matching strategy is Semantic-Directed Parsing, in which the parser matches the input words against patterns of semantic selection-restriction rules (Allen, 1995, pp. 341-342). These rules use word-word patterns to choose among stored object-object patterns reflecting the real world. Words are matched to selection restriction rules either directly or by using a lexicon that portrays each word as a type hierarchy. Since the patterns being matched are based on individual words, each word sense requires a different stored pattern. Linguistic generalizations are thus precluded, and the method requires either a large number of patterns or, as is usually the case in practice, a small vocabulary.
A Generative approach to semantic parsing is Semantic Grammar, in which immediate-constituent rules use semantic-like categories or special words (Allen, 1995, pp. 332-334; Beardon et al., 1992, pp. 129-130). A set of rules for airline reservations might be: reservation-phrase.fwdarw.reservation+reservation-modifiers; reservation.fwdarw.reservation-verb+flight#; reservation-modifiers.fwdarw."for"+name. Such rules are domain-specific. Also, the parser must have a separate rule for each syntactic variant, such as passive voice.
Conceptual Analysis
More ambitious is the attempt to parse a sentence according to the concepts it represents. This is done by viewing words as tokens for the underlying objects or concepts. The processor assembles this information by using "encyclopedia information" about the objects, rather than "dictionary information" about the words. Because words are of course the input for such parsing, even though it is their referents that are being analyzed, conceptual analysis can be difficult to discern from semantic parsing. A useful diagnostic is that conceptual analysis does not attempt to preserve word order. Consequently, conceptual analysis devices are not true natural language parsers.
A conceptual Pattern-Matching strategy is Preference Semantics, in which the parser matches conceptual structures for each input word against a set of templates for the conceptual structures of commonly encountered messages (Wilks, 1976, pp. 158-169). Each word's conceptual structure is represented in its lexicon entry as a tree of semantic primitives based on object-types and actions. Each sense of a word has its own tree, each tree also noting that sense's semantic case. The highest level of the tree is a broad category, such as "person," and is used for finding an appropriate slot in a template.
Templates are of the general form: agent/action/object; particular templates are more specific, such as "person/be/attribute." The parser substitutes the tree of each sense of each word into each of the stored templates, seeking a conceptual match. Adjectives, adverbs, and determiners are skipped until later. All matches are retained. The "preference" aspect of the parser arises in the next step, in which the multiple matches are winnowed. The semantic cases in the words' conceptual trees are used to identify preferred relations between lexicon entries; for example, some actions prefer a human agent. Similar preference rules are then used to place the adjectives, adverbs, and determiners. Thus, assembly of the conceptual structure is relatively independent of word order.
In practice, these assembly procedures are applied to fragments generated by subdividing the sentence, primarily at prepositions. Afterward, each conceptually-assembled fragment is tied to the next sentence-fragment by a "case tie," identified by matching each template against stored super-templates ("paraplates") for common case patterns. Each paraplate involves one sense of the preposition that starts the second template, and has the structure: (agent/action/object)1//case tie//(agent/action/object)2. Fragment-tying thus appears to be syntactic or semantic rather than conceptual. The resultant parse is an organized structure of semantic primitives, rather than a structure built of the original words.
A Generative approach to conceptual analysis is Conceptual Dependency. In this approach, each word's underlying concept becomes joined to the others not because they each match part of a stored message pattern, but because each concept in the lexicon is accompanied by expectations about its potential partners (Schank, 1972, pp. 556-560; Schank, 1975, pp. 22-25, 37-39, 41-43). In effect, the analyzer uses conceptual selection-restriction rules to assemble a structure.
First, syntactic procedures identify the main verb, the main noun, and the words and phrases that depend on these syntactically (Wilks, 1976, pp. 169-173). However, a syntactic tree is not produced. Instead, the lexicon is used to replace the verb with a conceptual dependency structure based on conceptual primitives; this structure is annotated with conceptual case information, similar to semantic case. The conceptual cases predict other conceptual items expected in the sentence, such as agents and recipients, for which the parser then searches. These are found by comparing unused sentence elements to syntactic patterns that have been prestored.
A related device identifies the sentence's semantic arguments for syntactic elements, such as the verbs, by using as an input an extensively-annotated syntactic parse (Jensen, 1992, U.S. Pat. No. 5,146,406). It then assembles the argument structure of the sentence without preserving word order. In both conceptual-dependency and preference-semantics parsers, the prior syntactic manipulation required is a major impediment to a purely conceptual analysis.
The disadvantages of present parsing methods are thus complexity, large required processor size, limited vocabulary, and slow speed. These disadvantages stem from the large number of syntactic rules used and the incorporation of semantic information to resolve ambiguities.
What is desired, therefore, is a parsing method and device which would utilize a small number of rules, minimal semantic information, and minimal computing power. Such parsing method and device would require minimal disambiguation of words having multiple meanings and would not rely on specific knowledge bases, word co-occurrence probability data, selection-restrictions, frames, or expectations about sentence content. Such parsing method and device would also process symbol strings in a time proportional to sentence length.