Speech recognition technologies permit a user to interface with a computerized system using spoken language. Speech recognition technology receives spoken input from the user, interprets the input, and then translates the input into a form that the computer system understands. More particularly, spoken input in the form of an analog waveform voice signal is digitally sampled. The digital samples are then processed by the speech recognition system according to a speech recognition algorithm.
Speech recognition systems typically recognize and identify words or utterances of the spoken input by comparison to previously obtained templates of words or utterances or by comparison to a previously obtained acoustic model of a person who is speaking. The templates and acoustic model are typically generated based upon samples of speech.
An example of a known speech recognition technique is known as word-level template string matching. During word-level template string matching, the spoken input signal is compared to pre-stored template strings which represent various words and phrases. Generally, a template which most closely matches the spoken input is selected as the output.
Another example of a known speech recognition technique is acoustic-phonetic recognition. According to acoustic-phonetic recognition, the spoken input signal is segmented and identified according to basic units of speech sound known as phonemes. The results of segmentation and identification are then compared to a pre-stored vocabulary of words. The word or words which most closely match the spoken input are selected as the output.
Yet another example of a known speech recognition technique is stochastic speech recognition. According to stochastic speech recognition, the spoken input is converted into a series of parameter values which are compared to pre-stored models. For example, the pre-stored models can be based on probabilities. In operation, samples of spoken words or sentences are received and then represented as parameter values which take into account statistical variation between different samples of the same phoneme. Probabilistic analysis is utilized to obtain a best match for the spoken input. Known algorithms for probabilistic analysis are the Baum-Welch maximum likelihood algorithm and the Viterbi algorithm.
Major considerations for such speech recognition processes are processing speeds and overall speech recognition accuracy. One of the common processes associated with speech recognition is building a natural language (NL) grammar vocabulary that can be used to ultimately represent the user's speech input. Building a NL grammar vocabulary from tagged data can be a burdensome process. It typically takes a human several weeks to complete an entire language grammar vocabulary by hand. Developing a NL grammar vocabulary engine that performs at real-time or near real-time speed, and that maintains a level of accuracy comparable to a human performing a NL grammar vocabulary would increase the likelihood of acceptance by users of such voice recognition systems.