To recognize and understand the intention of spoken utterances (e.g., voice menu commands by users of telephony applications), speech recognition systems typically use various spoken language comprehension techniques. Some techniques provide a specific list of sentences/phrases for which the telephony application listens. When a caller speaks an utterance, a matching sentence/phrase is selected based on weighted parameters and the like, or the caller is asked to repeat the utterance if no suitable matching sentence/phrase is found.
Data driven methods (e.g., utterance classification) are being used to continuously improve the performance of the speech recognition systems over time. Spoken (i.e., speech) utterance classification techniques, for example, are applied to a variety of spoken language comprehension tasks, including call routing, GPS voice-navigation, vehicle/device voice-control (i.e., hands-free), dialog systems, and command and control. The spoken utterance classification has been used recently to classify natural spoken responses to open-ended prompts like “How may I direct your call?”, “Where would you like to go?” and/or the like.
However, techniques that implement spoken utterance classification technology include costly steps, namely manual transcription and semantic labeling of the spoken utterances into speech data and annotated training data, respectively, which require human supervision. Because speech recognition system traffic is generally much higher than a number of affordable transcribed or labeled spoken utterances, a significant number of the spoken utterances remain unlabeled. It is impractical for humans to transcribe and/or semantically label a significant amount of the training data in order to marginally improve semantic classification performance for the telephony application.