This invention relates generally to a computational linguistics, and more specifically provides an algorithm for generating with Lexical Functional Grammars which uses construction and analysis of generation guides to determine internal facts and eliminate incomplete edges prior to constructing a generation chart.
One of the major concerns of computational linguistics is relating strings of words to abstract representations of meanings given a grammar of a particular language. The process of going from a string of words to an abstract representation of meaning is called “parsing”. The process of going from an abstract representation of meaning to a string of words is called “generation”. Parsing is useful for information retrieval, text understanding, dialog management, and translation. Generation is useful for dialog management, user interface output, and translation.
In the literature, generation has two different meanings. Generation can mean the process of figuring out what to say. This is sometimes called “planning”. We will refer to this herein as planning generation. Generation can also mean the process of figuring out how to say something, given that you know what to say. This is sometimes called “realization” or “tactical generation”. Although the latter seems easy by comparison with the former, it can be tricky to implement efficiently. This patent application is about a means for doing tactical generation. When we use the term generation in the rest of the patent application, we will always mean tactical generation.
Both parsing and generation assumes a grammar of some sort. In our terminology, a grammar is a declarative representation of the relationship between strings of words and their meanings. We are particularly interested in Lexical Functional Grammars, which provide a very expressive notation for describing languages. Lexical Functional Grammars (LFGs) are made up of phrase structure rules that are annotated with feature structure constraints. For instance, here is an LFG rule:                S→NP: ( SUBJ)=!; VP: =!.        
This says that an S (a sentence) is made up of an NP (a noun phrase) and a VP (a verb phrase). Furthermore, the feature structure constraint “( SUBJ)=!” indicates that the feature structure associated with the NP (denoted by “!”) is the SUBJ (the subject) of the feature structure associated with the S (denoted by “”). Also, the constraint “=!” indicates that the feature structure associated with the VP (denoted by “!”) is the same as the feature structure associated with the S (denoted by “”). The symbols “” and “!” are called meta-variables and they can be instantiated to different feature structures with each application of this rule.
Lexical Functional Grammars also have lexical entries which associate categories and feature structure constraints with particular words. For instance, we might have the following lexical entries:    John NP ( PRED)=‘John’.    slept VP ( PRED)=‘sleep<( SUBJ)>’            ( TENSE)=past.        
The first entry says that “John” can be an NP with the constraint ( PRED)=‘John’. This constraint says that the PRED (the predicate) of the feature structure associated with “John” is ‘John’. The single quotes around John indicate that it semantic, a predicate with no arguments that denotes the person named “John”. Similarly, “slept” can be a VP with the constraints ( PRED)=‘sleep<( SUBJ)>’ and ( TENSE)=PAST. The first constraint says that the feature structure associated with “slept” has a predicate named ‘sleep’ that takes one argument, which is the SUBJ (the subject) of the feature structure associated with “slept”. The second indicates that “slept” is a past tense verb.
If we use this information to parse the sentence “John slept”, we learn that “John” is an NP and “slept” is a VP, and that the NP and the VP can combine into an S. Furthermore, the constraints for “John” are instantiated to (f1 PRED)=‘John’ and the constraints for “slept” are instantiated to (f2 PRED)=‘sleep<(f2 SUBJ)’ and (f2 TENSE)=PAST, where f1 and f2 are new feature structure variables. Using the constraints on the S rule, we learn that (f2 SUBJ) is equal to f1. Thus we end up with the following constraints for “John slept”:    (f2 PRED)=‘sleep<f1>’ (f2 TENSE)=PAST    (f2 SUBJ)=f1    (f1 PRED)=‘John’
These constraints describe a feature structure that gives an abstract representation of the meaning of “John slept”. In particular, it gives the predicate-argument structure and the tense for the sentence. Using the same grammatical information, we can generate from these constraints. We start by noting that the  in the S rule must match f2. The constraint ( SUBJ)=! says that the feature structure associated with the NP (e.g. !) is the SUBJ of the feature structure associated with the S. Using the input constraints, we see that the ! must match f1. We then look in the lexical entries of the grammar for an NP that has constraints that match (f1 PRED)=‘John’. This gives us “John”. Similarly, the VP constraint =! tells us that the feature structure for the VP must be f2. We then look in the lexical entries for a VP whose constraints are consistent with (f2 PRED)=‘sleep<f1>’ and (f2 TENSE)=PAST. This gives us “slept”. We are now done generating the following tree:

Since the feature structure constraints associated with this tree are the same as the feature structure constraints given in the input (except perhaps for the order in which they appear), this is a valid generation tree. If the feature structure constraints associated with the tree had more or less constraints than the input, then this would not be a valid generation tree. If we just take the leaves of this tree we get “John slept” as the output of the generator.
In general, generation is the inverse of parsing. If parsing a particular string of words produces a particular abstract representation of meaning, then generating with that meaning and the same grammar should produce the same string of words. However, the relationship is not one-to-one. For instance, parsing “John saw the girl with the telescope” may produce two abstract representations:    (f1 PRED)=‘see<f2, f3>’    (f1 TENSE)=PAST    (f1 SUBJ)=f2    (f2 PRED)=‘John’    (f1 OBJ)=f3    (f3 PRED)=‘girl’    (f3 SPEC)=the    f4 $ (f3 MODIFIERS)    (f4 PRED)=‘with<f5>’    (f4 OBJ)=f5    (f5 PRED)=‘telescope’    (f5 SPEC)=theand    (f1 PRED)=‘see<f2, f3>’    (f1 TENSE)=PAST    (f1 SUBJ)=f2    (f2 PRED)=‘John’    (f1 OBJ)=f3    (f3 PRED)=‘girl’    (f3 SPEC)=the    f4 $ (f1 MODIFIERS)    (f4 PRED)=‘with<f5>’    (f4 OBJ)=f5    (f5 PRED)=‘telescope’    (f5 SPEC)=the
These are identical except that the first has f4 $ (f3 MODIFIERS) and the second has f4 $ (f1 MODIFIERS). The $ notation in f4 $ (f3 MODIFIERS) says that f4 is a member of the set denoted by (f3 MODIFIERS). This notation allows a sentence to have an unbounded number of modifiers. f4 $ (f3 MODIFIERS) means that “with a telescope” modifies “the girl”. f4 $ (f1 MODIFIERS) means that “with a telescope” modifies “saw”.
If we take the second representation and generate from it, we get “John saw the girl with the telescope”. We may also get “With the telescope, John saw the girl” and other sentences with similar meanings. Whether or not we get other sentences depends on the details of the grammar. For instance, if the grammar has a feature that indicates that “with the telescope” comes before the verb, then the generator will not produce “With the telescope, John saw the girl” since the feature structure for this sentence will include a feature that is not in the input.
Martin Kay's Generation Chart
In 1996, Martin Kay proposed to take the notion of “chart” that was popular in parsing and apply it to generation as disclosed in Kay, Martin, 1996, “Chart Generation”, 34th Annual Meeting of the Association for Computational Linguistics, Santa Cruz, Calif., pp. 200–204. In parsing, a chart is a data structure that caches the results of certain parsing operations. A chart consists of a set of data structures called edges and subtrees. An “edge” represents a substring of the string of words being parsed. It consists of a category (such as NP or VP), the position where the substring begins, and the position where the substring ends. The category indicates that this substring can be analyzed as the given category according to the given grammar. A “subtree” is a record of how an edge is constructed. It consists of the daughter edges that were used to construct the given edge. The subtrees for an edge are usually stored in the edge. We will use the notation CAT[i,j] for an edge, where CAT is the edge's category, i is the position of the beginning of the substring that the edge covers, and j is the position of the end of the substring that the edge covers.
To give an example of how a chart works in parsing, consider the sentence “John slept”. Putting identifiers between the words produces “1 John 2 slept 3”. When we discover that “John” can be analyzed as an NP, we add NP[1,2]→John to the chart, where NP[1,2] is an edge and John is a subtree. The 1 and 2 in NP[1,2] indicate that this edge covers the substring from 1 to 2. When we discover that “slept” can be analyzed as a VP, we add VP[2,3]→slept to the chart. Then we notice that since NP[1,2] ends with the same identifier that VP[2,3] begins with, we can add S[1,3]→NP[1,2] VP[2,3] to the chart. It is standard to index edges by the left and right identifiers so that deductions like this can be made quickly.
If a grammar is highly ambiguous, then using a parsing chart can make a huge difference in speed since it avoids reanalyzing substrings over and over again. In fact, it has been shown that for simple phrase structure grammars, the time taken to a parse a sentence is a cubic function of the length of the sentence in the worst case. The time taken to parse a sentence without using a chart or its equivalent can be an exponential function in the length of the sentence in the worst case.
Martin Kay's idea was to use a chart during generation. However, instead of having an edge indicate which of the words it covers, Martin proposed that an edge would indicate which of the semantic facts in the abstract meaning it covered, plus the feature structure variable that the edge corresponded to. We will use the notation CAT[var]{fact1 . . . factN} for such an edge, where CAT is the category (like NP or VP), “var” is the feature structure variable, and fact1 to factN are the semantic facts that the edge covers (e.g. includes in itself or one of its descendents).
Martin Kay's notion of semantic fact is a fact that is used in the semantics that cannot be duplicated. There can only be one instance of each semantic fact in the output, because duplicating semantic facts changes the meaning of the sentence. If the input only has one instance of a semantic fact, then the output of the generator should only include one instance of a semantic fact. In Martin Kay's algorithm, two edges cannot be combined into another edge if they share any semantic facts. The fact that semantic facts cannot be duplicated is very important, and will allow us to make a significant optimization later on.
Martin Kay's notion of semantic fact corresponds to constraints like (f2 PRED)=‘sleep<f1>’ in LFG. To make the examples easier to read, we will use the predicate name of a semantic fact (e.g. “sleep”) to represent it in the all of the examples below, and we will never have an example with more than one semantic fact with the same predicate name.
If we wanted to use Martin Kay's algorithm to generate from the input:    (f2 PRED)=‘sleep<f1>’    (f2 TENSE)=PAST    (f2 SUBJ)=f1    (f1 PRED)=‘John’then we might first add NP[f1]{John}→John to the generation chart. (The “John” in the curly brackets represents the semantic fact (f1 PRED)=‘John’, as discussed above.) Then we would add VP[f2]{sleep}→slept. Then we would notice that we could combine NP[f1]{John} and VP[f2]{sleep} to get S[f2]{John,sleep}→NP[f1]{John} VP[f2]{sleep}. Since this last edge covers all of the semantic facts and is consistent with the input, it is a well-formed generation according to Martin Kay's algorithm.
This definition of generation isn't quite right for Lexical Functional Grammars, since it doesn't guarantee that non-semantic facts like (f2 TENSE)=PAST or (f2 SUBJ)=f1 are expressed by the output of the generator. One might be tempted to treat these as semantic facts in Martin Kay's algorithm, but then his algorithm would give the wrong results because it assumes that semantic facts can only be expressed once, whereas these facts can be expressed many times in Lexical Functional Grammars without changing the meaning of the sentence. For now we will ignore the problem, but later we will describe techniques for dealing with non-semantic facts.
The advantage of a generation chart is that it avoids computing the same information over and over again. However, it is not as efficient as a parsing chart. This is because the number of edges in a generation chart can be an exponential function of the size of the input, whereas the number of edges in a parsing chart is at most a quadratic function of the size of the input. The difference is that in a parsing chart, the words that are covered by an edge are contiguous. Since there are only a quadratic number of different substrings in a string, the number of edges is a quadratic function in the length of the string. However, there is no requirement that the semantic facts in the input to generation be contiguous. An edge could cover any subset of the semantic facts in the input. Since there can be an exponential number of subsets of a set, there can be an exponential number of edges in a generation chart.
For example, consider the following LFG rules:    S→NP: ( SUBJ)=!; VP: =!.    VP→V: =!; (NP: ( OBJ)=!).    NP→{N: =!|
A: ! $ ( MODIFIERS); NP: =!}.
These rules are a little more complicated than the LFG rule that we looked at before. First of all, the VP rule says that the object NP and its constraints are optional by enclosing it in parentheses. This is to allow for both transitive sentences (such as “John kicked the ball”) and intransitive sentences (such as “John slept”). Second, the NP rule says that there are two ways to build an NP. The two different ways are enclosed in curly brackets and separated by a vertical bar. The first way is to have a single N. The second way is to have an A (an adjective) followed by an NP.
Now suppose that we had the following lexical entries:    black A ( PRED)=‘black’.    dogs N ( PRED)=‘dog’            ( NUM)=PL.            chase V ( PRED)=‘chase<( SUBJ)( OBJ)>’            ( TENSE)=PRES.            white A ( PRED)=‘white’.    cats N ( PRED)=‘cat’            ( NUM)=PL.and we wanted to generate from the following input:            (f1 PRED)=‘chase<f2,f4>’    (f1 SUBJ)=f2    (f2 PRED)=‘dog’    (f2 NUM)=PL    f3 $ (f2 MODIFIERS)    (f3 PRED)=‘black’    (f1 OBJ)=f4    (f4 PRED)=‘cat’    (f4 NUM)=PL    (f5 $ (f4 MODIFIERS)    (f5 PRED)=‘white’
If we make all of the lexical entries be edges in the generation chart and start combining them according to the rules given, we get the following added to the chart:    A[f5]{white}→white    N[f4]{cat}→cats    NP[f4]{cat}→N[f4]{cat}    NP[f4]{white,cat}→A[f5]{white} NP[f4]{cat}    V[f1]{chase}→chase    VP[f1]{chase}→V[f1]{chase}    VP[f1]{chase,cat}→V[f1]{chase} NP[f4]{cat}    VP[f1]{chase,white,cat}→V[f1]{chase} NP[f4]{white, cat}    A[f3]{black}→black    N[f2]{dog}→dogs    NP[f2]{dog}→N[f2]{dog}    NP[f2]{black,dog}→A[f3]{black} NP[f2]{dog}    S[f1]{dog,chase}→NP[f2]{dog} VP[f1]{chase}    S[f1]{black,dog,chase}→NP[f2]{black,dog} VP[f1]{chase}    S[f1]{dog,chase,cat}→NP[f2]{dog} VP[f1]{chase,cat}    S[f1]{dog,chase,white,cat}→NP[f2]{dog} VP[f1]{chase,white,cat}    S[f1]{black,dog,chase,cat}→NP[f2]{black,dog} VP[f1]{chase,cat}    S[f1]{black,dog,chase,white,cat}→NP[f2]{black,dog} VP[f1]{chase,white,cat}
The last edge generates “black dogs chase white cats”. However, in the process of producing this sentence, the generator also builds top-level edges for “black dogs chase cats”, “dogs chase white cats”, “dogs chase cats”, “black dogs chase”, and “dogs chase”. These are all ruled incomplete at the top since they are missing facts in the input. However, they add a considerable amount of time to the generation process. The problem gets much worse as you add more modifiers. (Consider all of the incomplete generations that would be produced in the process of generating something like “Big mean ugly black dogs chase little cute white cats”.)
Martin Kay and Internal Indices
Martin Kay solved this problem by distinguishing between internal and external indices. In the grammatical formalism that Martin Kay used, categories are annotated with the semantic indices that are accessible. For instance, here is the rule that says that a VP can have an NP object:    vp(x,y)→v(x,y,z) np(z).
This rule says that a vp category consisting of two semantic indices named x and y can be composed of a v category consisting of three semantic indices named x, y, and z followed by an np category consisting of one semantic index named z. Note that the v category and the np category share the semantic index named z. This index make the np the object of the v. Note further that the vp does not have the z index in its category. Martin Kay observed that since the z index is not accessible in the vp(x,y) category, it will never be accessible to any higher categories. This means that no new facts can be added that refer to the z index. So, the vp(x,y) had better have all of the facts in the input that refer to the z index. If the vp(x,y) category is missing a fact that refers to the z index, then all categories built upon it will be missing the fact, too. This means that the root category will be missing the fact, and it will be discarded as being incomplete. Therefore we can safely discard any vp(x,y) category that is missing facts that refer to the z index.
To see how this works, consider how one might generate “black dogs chase white cats” using a grammar in Martin Kay's grammatical formalism. Suppose that we had the following rules and lexical entries:    s(x)→np(y) vp(x,y)    vp(x,y)→v(x,y,z) np(z)    np(n)→adj(n) np(n)
blackadj(d)black(d)dogsnp(d)dogs(d)chasev(x, d, c)chase(x, d, c)whiteadj(c)white(c)catsnp(c)cats(c)
If we add these lexical entries to the chart and start combining edges we get the following additions to the generation chart:    adj(c){white}→white    np(c){cats}→cats    np(c){white,cats}→np(c){white} adj(c){cats}    v(x,d,c){chase}→chase    vp(x,d){chase,white,cats}→v(x,d,c){chase} np(c){white,cats}    vp(x,d){chase,cats}→v(x,d,c){chase} np(c){cats} INCOMPLETE!    adj(d){black}→black    np(d){dogs}→dogs    np(d){black,dogs}→adj(d){black} np(d){dogs}    s(x){black,dogs,chase,white,cats}→np(d){black,dogs}            vp(x,d){chase,white,cats}            s(x){dogs,chase,white,cats}→np(d){dogs}            vp(x,d){chase,white,cats} INCOMPLETE!        
Note that two edges are eliminated due to incomplete internal indices: vp(x,d){chase,cats} and s(x){dogs,chase,white,cats}. Although this may not seem like much in a short sentence like this, this technique can make a huge difference for long sentences. For this type of grammatical formalism, this technique reduces the number of edges from being typically exponential in the size of the input to being typically linear in the size of the input.
Unfortunately, this technique does not work for grammar formalisms that do not explicitly indicate which indices are internal and which are external. Arturo Trujillo proposed an algorithm for deriving this information from a grammar in Trujillo, Arturo, 1997, “Determining internal and external indices for chart generation”, Proc. of the 7th International Conference on Theoretical and Methodological Issues in Machine Translation (TMI-97), but this can be difficult for expressive grammar formalisms such as Lexical Functional Grammars and suffers from another problem which is described in the next section. John Carroll, Ann Copestake, Dan Flickinger and Victor Poznanski propose an improvement to Martin Kay's algorithm which treats intersective modifiers in a second pass in John Carroll, Ann Copestake, Dan Flickinger, and Victor Poznanski, 1999, “An efficient chart generator for (semi-)lexicalist grammars”, Proceedings of the 7th European Workshop on Natural Language Generation (EWNLG'99), pages 86–95, Toulouse. It uses Martin Kay's algorithm for the first pass, and so assumes that internal indices can be determined locally. This means that it would also be Inefficient for Lexical Functional Grammars.