Speech recognition enables users to interact with devices using spoken words. There are many technologies today that enable speech recognition. Some of the current technologies include techniques that predominantly analyze the speech spectrograms.
In one approach, a window (e.g., Hamming window, etc.) of 20 to 50 milliseconds (cepstral extraction) is applied, and then the spectrum of the captured waveform is measured and compared against the spectrum samples in a library of sounds. The comparison finds distances for the set of features and the feature with the minimum distance is selected.
Additionally, the currently known solutions require training of the tool by the speakers to supplement the pre-training from the corpus. Several HMM's (Hidden Markov Models) are set up to help with the identification of words represented by the sounds. Sometimes, statistical language models, semantic interpretation and acoustic models, such as phoneme based models, are also used to help identify the spoken word.