This invention relates generally to speech recognition and particularly to the control of computer software using spoken commands.
Currently available speech recognition software recognizes discrete spoken words or phonemes contained within words in order to identify spoken commands. The processing of the spoken commands is usually accomplished using what is known as a speech engine. Regardless of whether discrete terms or phonemes are utilized, the speech engine must be called by the application program which needs the speech recognition service.
Operating systems may include Application Program Interface (API) software utilities which provide speech recognition. The application may incorporate a call to the speech API or the speech recognition may be supplied externally by a second application that intercepts the speech and feeds the first application simulated keys or commands based on the speech input.
Having the application call the speech API requires the application to have intimate knowledge of the speech API, but more importantly forces the application to handle inputs from multiple sources and to attempt to synchronize these inputs. At any given instance of time, the application may receive a spoken command, may receive a return from a speech API and may also be processing tactile inputs such as associated key operations. This complexity makes the application prone to state errors. Providing a second application to intercept the spoken commands may not always be possible and requires an external server that has intimate knowledge of every application it must service.
Thus, there is a continuing need for a speech recognition system which operates with a speech engine without synchronization problems. In addition, there is a need for such a system that can send application commands from either speech or tactile responses. It would also be desirable to provide a speech recognition system which has relatively high reliability in terms of the ability to consistently recognize basic commands.