Voice or speech recognition is increasingly being used as a part of the user interface in computing devices of many different types. Many cellular telephones allow users to push a button and speak into the microphone to perform queries and execute a variety of different commands. Portable and desktop computers perform similar functions. These systems may also convert the speech to text and use this as text input for appointments, messages, or stored documents. Some of these systems process the user speech locally in the device, but many send a recording of the speech to a remote server. Automobiles may also receive voice commands and queries to operate a navigation system or other functions in the vehicle including making calls and sending messages. Voice or speech recognition is also used for identification, logon, and other purposes.
Users prefer a quick response to voice inputs; however, it may require significant processing resources to complete an accurate analysis of the speech or any other audio. In addition many speech analysis techniques are designed to receive a complete utterance and then analyze the complete utterance as a whole. This requires that the system wait for the user to stop speaking and then begin processing using the entire utterance. This inherent latency may be annoying especially when compared to the immediacy of keyboard and mouse inputs. For some systems, there may not be enough memory available to buffer the utterance so that speech analysis is not possible.
One technique used for speech recognition is to analyze the speech for Mel-Frequency Cepstral Coefficients (MFCCs). The MFCC's are compared to a reference for automatic speech recognition (ASR) and speaker recognition. In order to improve the accuracy and reliability of the MFCC, Cepstral Mean Subtraction (CMS) is used in combination with Cepstral Variance Normalization (CVN). CVN can be regarded as noise normalization in that the mean of the signal is subtracted which leads to the removal of stationary noise. These techniques allow the system to be used to good effect not only for user input purposes but also for user authentication and login passwords.