The use of portable electronic devices and mobile communication devices has increased dramatically in recent years. Mobile communication devices are offering more features such as speech recognition, pictures, music, audio, and video. Such features are facilitating the ease by which humans can interact with mobile devices. In particular, the speech communication interface between humans and mobile devices becomes more natural as the mobile devices attempt to learn from their environment and the people within the environment using the portable devices.
For example, Natural Language Dialog systems can include speech recognition which allows a user to speak to a mobile device for communicating a command or a query. Techniques for accomplishing speech recognition are well known in the art. Among known speech recognition techniques are those that use grammars. A grammar is a representation of the language or phrases expected to be used or spoken in a given context. Grammars typically constrain the speech recognizer to a vocabulary that is a subset of the universe of potentially-spoken words; and grammars may include sub-grammars. A grammar rule can then be used to represent the set of “phrases” or combinations of words from one or more grammars or sub-grammars that may be expected in a given context. “Grammar” may also refer generally to a statistical language model (where a model represents phrases), such as those used in language understanding systems.
Speech recognition grammars can predict words which are to be spoken at any point in a spoken command phrase. Essentially, from a speech recognition grammar, a speech recognition system can identify the words which should appear next in a spoken phrase. The use of speech recognition grammar can result in more accurate speech recognition since the list of possible words which can be spoken at any point in a spoken phrase is limited based upon the previously spoken words.
However, one of the most difficult and time-consuming tasks in the developing Natural Language Dialog Systems is creating or adapting pre-existing grammars. This task requires a high degree of linguistic training and expertise. Typically, the most difficult part of this task is beginning; creating an initial set of grammar rules. Normally, grammars are derived from a given training corpus. However, such training corpora are expensive and difficult to obtain, and new corpora must be obtained for each new application. If the target grammar is to contain semantic information, then the training corpus must be annotated for semantics, which requires additional time and expertise. Moreover, extracting an optimal grammar from a given corpus is an unsolved problem.
Existing grammar toolkits allow developers to generate a grammar using various approaches. In one approach, the developer must formulate sample utterances by anticipating the utterances a user may present to the dialog system. Essentially, the developer is presented with a blank slate and told to fill the slate with samples. If the developer fails to anticipate a full user utterance range, the coverage of the resulting grammar is inadequate. In another approach, the developer can refine an existing grammar specific to a system component, though the resulting grammar may be tightly integrated with its result, and cannot be used by other system components. In another method, a grammar can be created from a domain model having access to a large corpus. An interactive tool for semi-automatic creation of a domain has been described in U.S. Pat. No. 6,622,136 and is incorporated herein by reference. The domain model provides a useful, formalized representation of knowledge about the domain of an application that the system is addressing and reflects a particular domain expert's conceptualization of that knowledge.
In general, a user of a mobile device is the person most often using the speech capabilities of the mobile device. Due to limited constraints in processing power and memory, the mobile device may not be able to provide all the resources of a Natural Language Understanding system, including the speech recognition components of the system, such as the grammars, on the mobile device. The mobile device may only be capable of supporting a few default speech grammars which may not adequately provide grammar coverage to the user. In addition, limitations of the device itself may not provide certain features that are available on other mobile devices. For example, certain speech processing aspects may only be available to a higher tier product. The user, and the developer of the speech grammars, may not be aware of the capabilities available to the device.
In many interactive systems, the burden is on the developer to either formulate a grammar from scratch, or to come up with a labeled corpus of sample utterances, both of which require time and expertise. Presently, developers write software for Natural Language Dialog systems without general knowledge of what device will be running the software. Accordingly, developers may not be aware of the capabilities the device can support for a natural language interface. A need therefore exists for opening access to capabilities of a device.