1. Field of the Invention
The disclosed embodiments relate to the field of voice recognition, and more particularly, to voice recognition in a wireless communication system.
2. Background
Voice recognition (VR) technology, generally, is known and has been used in many different devices. A VR system may operate in an interactive environment. In such a system, the user may respond with an audio response, such as a voice response, to an audio prompt, such as a voice prompt, from a device. Referring to FIG. 1, generally, the functionality of VR may be performed by two partitioned sections such as a front-end section 101 and a back-end section 102. An input 103 at front-end section 101 receives voice data. A microphone (not shown) may originally generate the voice data. The microphone through its associated hardware and software converts the audible voice input information into voice data. Front-end section 101 examines the short-term spectral properties of the input voice data, and extracts certain front-end voice features, or front-end features, that are possibly recognizable by back-end section 102.
Back-end section 102 receives the extracted front-end features at an input 105, a set of grammar definitions at an input 104 and acoustic models at an input 106. Grammar input 104 provides information about a set of words and phrases in a format that may be used by back-end section 102 to create a set of hypotheses about recognition of one or more words. Acoustic models at input 106 provide information about certain acoustic models of the person speaking into the microphone. A training process normally creates the acoustic models. The user may have to speak several words or phrases for creating his or her acoustic models.
Generally, back-end section 102 compares the extracted front-end features with the information received at grammar input 104 to create a list of words with an associated probability. The associated probability indicates the probability that the input voice data contains a specific word. A controller (not shown), after receiving one or more hypotheses of words, selects one of the words, most likely the word with the highest associated probability, as the word contained in the input voice data. The system of back end 102 may reside in a microprocessor. The recognized word is processed as an input to the device to perform or respond in a manner consistent with the recognized word.
In the interactive VR environment, a user may provide a voice response to a voice prompt from a device. The voice prompt from the device may last for a period of time. While the voice prompt is playing by a speaker (not shown), the user may provide the voice response through a microphone (not shown). As a result, the input voice data 103, as picked up by the microphone, is a combination of the voice prompt and the user voice response. Therefore, the input voice data 103 may include a more complex set of voice features than the user voice input alone. When the user voice features are mixed with other voice features, the task of extracting the user voice features is more difficult. Therefore, it is desirable to have an improved interactive VR system.