The present invention relates to natural language interfaces, and more specifically to a natural language processor which predictively parses an input stream of words.
In traditional natural language interfaces, the user frequently overshoots the capabilities of the system. This occurs because traditional interfaces do not yet cover all of English grammar and the user is unaware of the subset of natural language which the interface does cover. Users enter entire sentences only to discover that the system can not understand these sentences. Furthermore, once the user discovers that the interface does not cover certain linguistic ground, the user may undershoot the capabilities of the system; there are capabilities in the system that the user does not know about and fails to discover.
In summary, there is no straightforward route to matching the covered sublanguage of the traditional text entry natural language interfaces with the sublanguage the user naturally uses. These problems do not exist in NLMenu. an interfacing methodology developed at Texas Instruments. In that product, menus guide the user to select only sentences which the system understands. The NLMenu system offers a more comfortable environment and encourages the user to explore the system's capabilities interactively. Selection of lexical items from windows also eliminates spelling errors.
With guided query composition, grammars and lexicons become much smaller. This is in contrast to traditional natural language interfaces, where a considerable amount of effort must be expended to develop a large grammar in an attempt to cover all possible linguistically correct inputs within a given domain.
NLMenu uses a semantic grammar as its underlying formalism to cover the grammar which the interface supports. A semantic grammar is a context-free grammar where the choice of terminals and nonterminals is governed by semantic as well as syntactic function. (Semantic grammars are described in the book Artifical Intelligence by Elaine Rich, in Chapter 9.) Therefore, only semantically meaningful sentences are formed. This is not necessarily true for a system which performs syntactic parsing first followed by a check on semantic constraints and compositionality. Semantic grammars can also be written to accept not strictly syntactically correct sentences and still produce semantically meaningful results (e.g., "What is weight of car?"). LADDER was one of the largest systems developed using a semantic grammar. (LADDER is described in "Developing a Natural Language Interface to Complex Data", Hendrix, et al, ALM Transactions on Database Systems. Volume 3, 1978. )
NLMenu provides a semantic grammar with one degree of domain independence without changing the grammar: interfaces to databases may be generated with minimal effect. It accomplishes this by instantiating a set of grammar and lexicon rules at interface creation time with appropriate items from the database scheme. The uninstantiated grammar and lexicon are called the "core grammar" or "generic grammar."
Unfortunately, NLMenu still suffers the ills of other semantic grammar based interfaces. Even though domain independence has been achieved within the database realm, it is not especially easy to transport the syntactic knowledge to a new, non-relational domain. This becomes clear upon examination of the grammar; it knows about modifying phrases for relations within the database scheme, but it does not know about modifying phrases in general. Additionally, while the number of uninstantiated grammar and lexicon rules is small, the number of instantiated rules, generated at compile time, grows quite rapidly with the domain size. There are two fundamental reasons for this. First, as discussed above, syntactic generalizations are missed. Second, the grammar rules which force semantic agreement grow quickly as the number of semantic values grow.
Thus, NLMenu has several major drawbacks. One is that its grammar and lexicon must be recompiled with a new scheme whenever the application domain changes. Also, the menu driven input paradigm becomes unwieldy as the grammar and lexicon grow. Further, complex domains result in a very large number of grammar rules, so that, in an application having complex queries, the size of the interface becomes extremely large, and its speed slow. NLMenu uses a predictive parser, but this parser only works for context-free grammars.
Therefore, in order to provide a better predictive parser, a parsing method according to the present invention accomodates a grammar and lexicon defined in terms of linguistic principles as well as metasemantic grammars (or core grammars). The grammar and lexicon, taken together, define a set of valid natural language sentence structures, with the precise words defined by the lexicon. The defined structures include information regarding features of the words. When words are read by the parser, these features become fixed within the structure.
The predictive parser preferably generates a set of valid next elements, and each incoming word is compared against the set. Words not in this set are rejected as invalid.
The novel features which characterize the present invention are defined by the appended claims. The foregoing and other objects and advantages of the present invention will hereafter appear, and for purposes of illustration, but not of limitation, a preferred embodiment is shown in the accompanying drawings.