As shown in FIG. 1, numeral 100, a semantic representation (110) is generated from a natural-language utterance (101) by the following three-step process. The natural-language utterance is perceived by a speech recognizer (102), which is coupled to a lexicon (103) containing a plurality of words of the natural language. The speech recognizer outputs a word graph (104), containing at least one word from the lexicon that is hypothesized to correspond to the natural-language utterance. The word graph is input into a parser (105), which is coupled to a grammar (106) for the natural language. Using the grammar, the parser constructs a parse forest (107). The parse forest is input into a semantic interpreter (108), which is coupled to a knowledge base (109). The semantic interpreter processes the parse forest according to a predetermined semantics and outputs the semantic representation corresponding to the natural-language utterance.
As shown in FIG. 2, numeral 200, a word graph (201) is an acyclic directed graph that contains a plurality of vertices connected by arcs. Each vertex has a unique label that permits a strict ordering of the vertices from a start vertex (202), labeled t.sub.0, to an end vertex (203), labeled t.sub.7. The labels on the vertices typically correspond to frame identifiers generated during the processing of a speech signal. Each arc in the word graph is labeled by a word from the lexicon; for example, the arc (204) is labeled by the word "show", hypothesized as occurring between the vertex labeled t.sub.0 and the vertex labeled t.sub.1, while the arc (205) is labeled by the word "show", hypothesized as occurring between the vertex labeled t.sub.0 and the vertex labeled t.sub.2. As is known in the art, the use of the word graph improves the robustness of the speech recognizer by allowing it to hypothesize alternative words occurring during the same segment of the speech signal. However, the number of paths through the word graph is typically very large; in order to constrain the resulting search space, the natural language to be recognized is formally defined by a context-free (CF) grammar (301). As shown in FIG. 3, numeral 300, a CF grammar G for a language L is formally defined as G.sub.L =&lt;V.sub.N, V.sub.T, S, P&gt;, where:
1. V.sub.N is a finite non-empty set of nonterminal symbols (302). PA0 2. V.sub.T is a finite non-empty set of terminal symbols (303), corresponding to the words in the lexicon of the language L. PA0 3. V.sub.N and V.sub.T are disjoint sets (V.sub.N .andgate.V.sub.T =.phi.). PA0 4. S is a finite non-empty set of start symbols (304); S.OR right.V.sub.N. PA0 5. .SIGMA. is an alphabet consisting of all nonterminal and terminal symbols in the grammar G.sub.L (.SIGMA.=V.sub.N .orgate.V.sub.T). PA0 6. A string Z.epsilon..SIGMA.* is a sequence of at least one symbol from the alphabet .SIGMA.. PA0 7. P is a finite non-empty set of productions (305) mapping V.sub.N to .SIGMA.*. PA0 1. "prep.fwdarw.to" (309), resulting in the nonterminal node (407). PA0 2. "prop-noun.fwdarw.Chicago" (310), resulting in the nonterminal node (410). PA0 3. "NP.fwdarw.prop-noun" (308), resulting in the nonterminal node (409). PA0 4. "PP.fwdarw.prep NP" (307), resulting in the nonterminal node (407).
A production (306) encodes a relationship of the form "X.fwdarw.Y" ("X rewrites as Y*"), where X.epsilon.V.sub.N and each Y.epsilon..SIGMA.. Each production in P may have additional information associated with it such as a probability of occurrence for the production, a semantic representation of the production, etc., as is known in the art. A string .SIGMA.*Y*.SIGMA.* is "directly derived" from a string .SIGMA.*X.SIGMA.* (written as: .SIGMA.*X.SIGMA.*.SIGMA.*Y*.SIGMA.*) iff ("iff" means "if and only if") the production "X.fwdarw.Y*" is a member of P. A string .SIGMA.*Y*.SIGMA.* is "derived" from a string .SIGMA.*X.SIGMA.* (written as: .SIGMA.*X.SIGMA.*.SIGMA.*Y*.SIGMA.*) iff an ordered set of strings D={D.sub.0, . . . , D.sub.n } exists such that each D.sub.i+1 is directly derived from D.sub.i. A string U.epsilon.V.sub.T * is a sentence of L iff U is derived from one start symbol X.epsilon.S.
As shown in FIG. 4, numeral 400, a parse forest (401) contains at least one phrase-structure (PS) tree (402, 403). A PS tree contains at least one nonterminal node (404) and at least one terminal node (405). A nonterminal node is labeled by one nonterminal symbol from the grammar and must dominate at least one other node; for example, the nonterminal node (404) is labeled "S" and dominates the nonterminal node (406), which is labeled "VP". Each terminal node in the PS tree is labeled by a terminal symbol from the grammar and corresponds to one arc in the word graph. The topmost node of the PS tree is labeled by one start symbol of the grammar and the ordered set of terminal node labels of the PS tree is the "yield" of the PS tree. The yield of the PS tree corresponds to one path through the word graph that connects the start vertex with the end vertex. Thus the terminal node (405), labeled with the word "show", in the PS tree (402) corresponds to the arc (204) in the word graph (201), instead of the arc (205), because only the arc (204) is on the path through the word graph that corresponds to the yield of the PS tree (402). However, the terminal node (411), labeled with the word "show", in the PS tree (403) corresponds to the arc (205) in the word graph (201), instead of the arc (204), because only the arc (205) is on the path through the word graph that corresponds to the yield of the PS tree (403). The natural-language utterance "Show me all flights to Chicago" is analyzed as six terminal nodes in the PS tree (402), with each terminal node labeled by a single word. The remainder of the PS tree is constructed by use of the productions in the grammar; for example, the prepositional phrase "to Chicago" (corresponding to the nonterminal node (407), labeled "PP") is constructed by the use of four productions, as follows:
As shown in FIG. 5, numeral 500, both the word graph and the parse forest may be simultaneously stored in a chart (501) containing a plurality of vertices connected by edges. The use of the chart permits the simultaneous representation of mutually exclusive hypotheses, as is known in the art. The chart in FIG. 5 contains seven vertices, respectively labeled "0" (502), "1" (503), "2" (504), "3" (505), "4" (506), "5" (507), and "6" (508). Each edge in the chart is labeled with a word from the word graph or a nonterminal symbol from the grammar. For example, the two instances of the word "show" in the graph are encoded as "[0 show 1]" (509) and "[0 show 2]" (510). The natural-language utterance "Show me all flights to Chicago" is represented in the chart by the edges 509, 511, 512, 514, 516, and 520. Additional edges in the chart may be labeled by words other than those actually spoken; these edges represent hypothesized words generated by the speech recognizer (510, 513, 515, 517, 518, 519, and 521). The PS tree corresponding to a syntactic analysis of the natural-language utterance is represented in the chart by a set of nonterminal edges. Each nonterminal edge in the set of nonterminal edges corresponds to one nonterminal node in the PS tree and is labeled with the corresponding nonterminal symbol. The edge corresponding to the topmost node in the PS tree is called the root edge. Each nonterminal edge also contains a reference to at least one edge that is dominated by the nonterminal edge. For example, in FIG. 5 the prepositional phrase "to Chicago" is encoded in the chart by the nonterminal edge (522). An example of the encoding of edge references in nonterminal edges is presented in edge reference detail (523). The nonterminal edge (533), labeled with the nonterminal symbol "PP", references both the nonterminal edge (530), labeled with the nonterminal symbol "prep", and the nonterminal edge (532), labeled with the nonterminal symbol "NP". Similarly, the nonterminal edge (530) references the lexical edge (528), labeled with the word "to", and the nonterminal edge (532) references the nonterminal edge (531), labeled with the nonterminal symbol "prop-noun". Finally, the nonterminal edge (531) references the lexical edge (529), labeled with the word "Chicago".
Each edge in the chart encodes the instantiation of a production in one stage of completion. If the production is fully instantiated, as in the above example, then the edge is "complete". If the production is partially instantiated, then the edge is "active". An example of an active edge is presented in active edge detail (534). The active edge (543), labeled with the nonterminal symbol "PP", references the complete edge (540), labeled with the nonterminal symbol "prep", and is "right-active" at the chart vertex (538), labeled "5", with a right remainder (544) consisting of the nonterminal symbol "NP". The active edge (543) therefore encodes the partial instantiation of the production "PP.fwdarw.prep NP" in the chart, at the point during parsing when the nonterminal edge (540), labeled with the nonterminal symbol "prep", has been located, and the parser requires an additional edge, labeled "NP" and starting at the chart vertex (538), in order to complete the active edge (543). A chart parser is "bidirectional" if it can simultaneously process both right-active and left-active edges; i.e., active edges that can extend toward either the end vertex or the start vertex of the chart.
The parser component of an automatic speech recognition (ASR) system determines the structure of an utterance as a crucial step toward determining the utterance's meaning. The parser's usefulness may be increased by combining bidirectionality with "island-driving", which allows the parser to consider words and nonterminal edges in the chart independently of their linear order, a technique that has been shown to be superior to strictly left-to-right processing. However, this improvement has the undesirable side effect of introducing duplicate edges into the chart, since a production can generate an edge from any of its constituent symbols and the resulting edge can expand bidirectionally. A technique is therefore needed to eliminate duplicate edges to avoid a combinatorial explosion of these edges during parser operation. Hence, there is a need for a method, device and system for an efficient generalized bidirectional island-driven chart parser.