Speech recognition and voice processing systems are known for translation of dictated speech into text or computer instructions (such as menu operations, and the like). Conventional speech recognition systems use a number of different algorithms and technologies in a perennial effort to recognize a user's speech and do what the user desires based on that speech recognition. A common application of this technology is in the classic dictation sense where voice is converted into text in a word processing application. Another application is conversion of voice into common instructions for menu operations, such as open a file, close a file, save a file, copy, paste, etc.
In most systems, there is a computing device with memory, storage, and a processor, that executes a software application enabling the speech recognition functionality. A user speaks into a microphone and the speech recognition software processes the user's voice into text or commands.
There are several performance factors that are considered when assessing these speech recognition applications. Among the factors are speed and accuracy. The users of such applications desire that the applications interpret the user's voice as are understood. Likewise, the users of such applications also benefit from the applications providing feedback in real-time, so that the user knows as quickly as possible what the application heard and what it is doing in response to the voice input, or commands are acted on quickly.
The process of speech recognition is computationally intense, and requires substantial processing resources. There are varying levels of known algorithms and methods that make tradeoffs between processing power required, and accuracy and speed of translation. Generally, the more accurate speech recognition applications provide the fastest speech recognition results with use of more powerful processing resources. Likewise, when more powerful processing resources are not available, the speech recognition applications must carry out fewer processes and therefore have reduced accuracy, and/or reduced processing speeds.
Example processing systems that are presently obligated to make some sacrifices in quality and/or speed (when compared with desktop or server computers) are systems that can be found in handheld devices, including personal digital assistances (PDAs), smart phones, palmtop devices, and other pocket sized devices capable of executing software applications. As such, present speech recognition capabilities on such pocket devices are more limited than the capabilities presently found on desktop or server computers. There is simply insufficient processing power with present technology, and the algorithms for speech recognition are too complex, for handheld devices to perform at the same level as computers with more physical presence.