Processing capability of mobile devices has rapidly grown in recent years. Such growth has opened up application areas for speech and natural language processing technologies. For example, voice search is one such application where speech technology is making a significant impact by enabling people to access the Internet conveniently from mobile devices. Spoken queries are a natural medium for searching the Mobile Web, especially in the common case where typing on a device keyboard is impractical or inconvenient. Voice search is now recognized as a core feature of many mobile devices, and several related applications have been developed.
Automatic speech recognition, allows individuals to use a voice command or voice query to search the Internet and/or electronic devices. A voice search is a search executed using a spoken query or spoken utterance. Such voice searching typically involves a device or processor converting a spoken utterance into text, such as by converting spoken words, numbers and characters into a text string or textual representation of the spoken utterance. Several Automatic speech recognition techniques require the processing of numerous feature vectors of speech objects using Gaussian Mixture Model (GMM), hidden Markov model (HMM), and Feature-space Minimum Phone Error (fMPE) techniques.
Mobile platforms also have low available RAM for storing fMPE transformation matrices, so the floating point number are converted to integers and highly compressed via quantization to 2-bits per coefficient. The fMPE techniques are used for training (e.g., hidden Markov Model parameters) in speech recognition and other applications. The fMPE transforms are applied to the feature vector (fingerprint) of each incoming frame of audio in order to make the vector more useful for discriminating between similar phones. When running automatic speech recognition (ASR) on a mobile platform, floating point operations for FMPE can take up to 10% of the central processing unit (CPU) time. To parallelize matrix operations, one must parallelize de-quantization (else the de-quantization step dominates the computation time). This de-quantization requires a table lookup for each value. The fMPE values are quantized to 2-bits, and each 2-bit pattern must be de-quantized to an (arbitrary) 32-bit floating point value.