With the proliferation of computers and computing devices throughout society, more and more attention is being turned to alternative methods of data entry to replace traditional keyboards. Numerous computer programs are available which perform recognition of speech. Most of these computer programs are "applications", that is to say computer programs in which the functionality and operation of a program is specifically tailored to a dedicated purpose for the program. Thus, for example, dictation applications are known, which accept a user's voice as input and cause text to be entered into a document, corresponding to the user's voice input, in a manner similar to a word processor. Another example of a speech recognition application would be a control program for controlling an item of equipment, for example, for dialing telephone numbers in a hands-free radio telephone. In such an application, the user would speak the digits to be dialed and would, for example, speak the command "send", causing the cellular radio telephone to dial the number spoken. These are examples of dedicated speech recognition applications.
In the paper "Augmenting a Window System with Speech Input" by C. Schmandt, M. S. Acherman, and D. Hindus in Computer, Vol. 23, No. 8, pages 50-60, August 1990, a voice recognition application is described for control of window navigation tasks. The application, entitled "X Speak" is a speech interface to an X window system, in which words are associated with windows. Speaking a window's name moves it to the front of the computer screen and moves the cursor into that window. The X Speak application assumes some of the functions normally assigned to a mouse. Various commands are described, such as "create" for starting an application, "recall" for repositioning a window to the top of a window stack, and "hide" for repositioning a window to the bottom of the window stack. There are also commands for resizing and repositioning windows, etc. The authors of the paper admit that any ergonomic efficiency from use of the window navigation tasks application is limited or non-existent.
A disadvantage with existing speech recognition applications is their lack of flexibility. In a typical speech recognition application, a vocabulary of recognizable words is associated with the application. The recognizer attempts to recognize words from within its vocabulary. Techniques can be provided to attempt to recognize words that are not within the vocabulary. Vocabularies can be expanded or replaced to tailor the performance of the recognition operation to the user.
In the case of the X Speak application, which is a tool associated with an operating system, there is a dedicated set of possible commands which can be recognized. The available set of commands is pre-programmed into the application. This provides for lack of flexibility and is not highly suited to modern multi-application personal computers and similar equipment, in which new applications are loaded into the equipment from time to time and in which many applications can be run consecutively.
There is a desire to have a more ubiquitous speech recognition interface, potentially capable of at least partially replacing both a keyboard for data and command entry and a mouse for screen navigation.
Greater flexibility for application developers, who wish to speech-enable their applications, is provided by a speech application programming interface (SAPI) from Microsoft Corporation, which permits a general purpose speech search engine to recognize commands of different applications. No provision is made for directing speech to any application other than a current in-focus application or for handling multiple speech-enabled applications. No provision is made for recognizing commands for an application that has not yet been activated and run for the first time.
There is a desire for the speech interface to direct speech to multiple applications or applications that are newly installed and have not yet been operated.
It is also a problem that speech may include operating system commands (e.g. "minimize window", "close window") as well as application directed speech commands (e.g. "begin dictation") and application directed content (e.g. "Memo to Mr. Jones). There is a need to determine the most appropriate destination for the speech, which cannot readily be done without performing recognition, which preferably must be tailored to the potential task to which the speech may be directed. For this, a vocabulary and language model (or its equivalent) specific to the task is desirable.