1. Field of the Invention
This invention relates to the automated acquisition of grammar fragments for recognizing and understanding spoken language.
2. Introduction
In speech-understanding systems, the language models for recognition and understanding are traditionally designed separately. Furthermore, while there is a large amount of literature on automatically learning language models for recognition, most understanding models are designed manually and involve a significant amount of expertise in development.
In general, a spoken language understanding task can have a very complex semantic representation. A useful example is a call-routing scenario, where the machine action transfers a caller to a person or machine that can address and solve problems based on the user's response to an open-ended prompt, such as “How may I help you?” These spoken language understanding tasks associated with call-routing are addressed in U.S. patent application Ser. No. 08/528,577, “Automated Phrase Generation”, and U.S. Pat. No. 5,675,707 “Automated Call Routing System”, both filed on Sep. 15, 1995, which are incorporated herein by reference in their entireties. Furthermore, such methods can be embedded within more complex task, as disclosed in U.S. patent application Ser. No. 08/943,944, filed Oct. 3, 1997, which is also hereby incorporated by reference in its entirety.
While there is a vast amount of literature on syntactic structure and parsing, much of that work involves a complete analysis of a sentence. It is well known that most natural language utterances cannot be completely analyzed by these methods due to lack of coverage. Thus, many approaches use grammar fragments in order to provide a localized analysis on portions of the utterance where possible, and to treat the remainder of the utterance as background. Typically, these grammar fragments are defined manually and involve a large amount of expertise.
In an attempt to solve some of these problems, U.S. patent application Ser. Nos. 08/960,289 and 08/960,291, both filed Oct. 29, 1997 and hereby incorporated by reference in their entireties, disclose how to advantageously and automatically acquire sequences or words, or “superwords”, and exploit them for both recognition and understanding. This is advantageous because longer units (e.g., area codes) are both easier to recognize and have sharper semantics.
While superwords (or phrases) have been shown to be very useful, many acquired phrases are merely mild variations of each other (e.g., “charge this call to” and “bill this to”). For example, U.S. patent application Ser. No. 08/893,888, filed Jul. 8, 1997 and incorporated herein by reference in its entirety, discloses how to automatically cluster such phrases by combining phrases with similar wordings and semantic associations. These meaningful phrase clusters were then represented as grammar fragments via traditional finite state machines. This clustering of phrases is advantageous for two reasons: First, statistics of similar phrases can be pooled, thereby providing more robust estimation; and second, they provide robustness to non-salient recognition errors, such as “dialed a wrong number” versus “dialed the wrong number”.
However, in order to utilize these grammar fragments in language models for both speech recognition and understanding, they must be both syntactically and semantically coherent. To achieve this goal, an enhanced clustering mechanism exploiting both syntactic and semantic associations of phrases is required.