This invention relates to techniques and apparatus for carrying out chart parsing making direct use of compactly encoded grammars. The invention has application to automatic speech recognition with natural language input.
Natural language interfaces play an increasingly important role in the use of small handheld devices, such as cell phones and personal digital assistants (PDAs). Natural language interfaces are also becoming important in a range of other applications, including automotive accessory control and home-appliance control. In all of the applications, there are benefits in having the natural language interface be as efficient as possible so as to minimize cost, size and power consumption.
Natural language processing systems that make use of context free grammars must load these grammars from a textual format into internal memory. The grammars may be written in a compact format, such as the Backus-Naur Form (BNF) described in xe2x80x9cThe Syntax And Semantics Of The Proposed International Algebraic Language Of The Zuerich Acm-Gamm Conferencexe2x80x9d, by J. Backus, published in Information Processing: Proceedings of the International Conference on Information Processing, Paris, pp 125-132, UNESCO, 1959. If such a compact form is used, the rules of the grammar must typically first be expanded in order for a chart parser to make use of them. Many algorithms exist for parsing natural language using context free grammars. These algorithms use numerous techniques to improve performance, the most important being the use of a chart to avoid re-computation of previous results and the incorporation of filtering techniques to avoid computation of irrelevant results.
Until recently, relatively little attention has been given to direct parsing with context free grammars written in a compact form, such as BNF. Mostly for theoretical reasons, some approaches deal with particular types of compacted grammar notations. For example, xe2x80x9cAn efficient context-free parsing algorithmxe2x80x9d, J. Earley, Communications of the ACM, 6(8), 451-455, 1970, shows how a chart parser can be extended to deal with express repetition. xe2x80x9cDirect Parsing of ID/LP Grammarsxe2x80x9d, S. Shieber, Linguistics and Philosophy, 7:135-154, 1984, discusses the extension of a chart parser for direct processing of Immediate Dominance/Linear Precedence. The abbreviated notation in ID/LP grammars is designed especially for abbreviating grammars of natural languages that exhibit relatively free word order. However, none of these approaches take advantage of the compact BNF representation for context-free grammars that is often used by the author of a grammar during development.
A related chart parsing algorithm is proposed in xe2x80x9cSOUP: A Parser For Real-World Spontaneous Speechxe2x80x9d, M. Gavalda, International Workshop on Parsing Technologies, 2000. This algorithm processes expressions in a top-down fashion, using recursive transition networks automatically derived from a grammar in the Java Speech Grammar format. A top-down parsing approach is conjectured to be less efficient than a bottom-up approach as it comes to processing fragmentary input resulting from speech recognition errors and/or ungrammatical utterances.
Existing parsers are unable to make direct use of a grammar represented in an abbreviated or compact form, such as the Backus-Naur form. Consequently, significant memory and processing resources are required to expand and store the rules of an abbreviated grammar. There is an unmet need for a parser that can make direct use of a grammar represented in an abbreviated or compact form.