The use of a speech recognition system (or a voice system) to translate a user's spoken command to a precise text command that the target system can input and process is well known. For example, in a conventional voice system based in a vehicle, a user (e.g., driver) interacts with the voice system by uttering very specific commands that must be consistent with machine-based grammar that is understood by the target system.
By way of example, assume that the climate control system in the vehicle is the target system. In order to decrease the temperature in the vehicle, the user of a conventional voice system may typically have to utter several predetermined machine-based grammar commands, such as the command “climate control” followed by the command “air conditioner” followed by the command “decrease temperature” followed by the command “five degrees.”
Unfortunately, people do not talk or think in terms of specific machine-based grammar, and may also forget the precise predetermined commands that must be uttered to effectuate their wishes.
One approach that attempts to overcome the machine-based grammar problem is to use a single-stage front end action classifier that detects a very general subject from the user's speech, which is then provided to a human operator for further intent determination. This is typically the approach used in the General Motors' OnStar™ system. However, a major problem with this approach is that a human operator is required.
Another approach is to build a full-fledged statistical parser, which takes the input as transcribed and builds a parse tree which is mined later to extract intent. One major difficulty in this second approach is that statistical parsers are huge in terms of storage requirements. Further, they require hand-tuning in every step. That is, every time data is added, the statistical parser requires a tremendous amount of hand-tuning and balancing of the new data with the old data.
Accordingly, improved techniques are needed that permit a user to employ more human-based grammar (i.e., free form or conversational input) while addressing a target system via a voice system.