Many mobile telephones (here meant to encompass at least data processing and communication devices that carry out telephony or voice communication functions) are provided with voice-assisted interface features that enable a user to access a function by speaking an expression to invoke the function. A familiar example is voice dialing, whereby a user speaks a name or other pre-stored expression into the telephone and the telephone responds by dialing the number associated with that name.
To verify that the number to be dialed or the function to be invoked is indeed the one intended by the user, a mobile telephone can display a confirmation message to the user, allowing the user to proceed if correct, or to abort the function if incorrect. Audible and/or visual user interfaces exist for interacting with mobile telephone devices. Audible confirmations and user interfaces allow a more hands-free operation compared to visual confirmations and interfaces, such as may be needed by a driver wishing to keep his or her eyes on the road instead of looking at a telephone device.
Speech recognition is employed in a mobile telephone to recognize a phrase, word, sound (generally referred to herein as utterances) spoken by the telephone's user. Speech recognition is therefore sometimes used in phonebook applications. In one example, a telephone responds to a recognized spoken name with an audible confirmation, rendered through the telephone's speaker output. The user accepts or rejects the telephone's recognition result on hearing the playback.
In human speech, each utterance has certain qualities that can be quantified, called prosodic parameters, which determine what the utterance sounds like. These are usually considered pitch or tone, timing of elements of the speech, and stress, usually represented as energy. Speech recognition systems use other features of speech, such as vocal tract shape, which are non-prosodic but help determine what was said. Human listeners are adept at discerning qualities of speech based in part on the prosodic parameters of the speech. Also, human speakers use prosody in speech to aid overall communication and to distinguish their speech from that of other speakers. Humans are thus naturally sensitive to prosody, and can easily determine the difference between “real” human speech and “synthesized” speech produced by a machine (speech synthesizer). In fact, synthesized speech using poor prosodic rules can be unintelligible to the human ear.