Field
The technology of the present application relates generally to speech recognition systems, and more particular, to apparatuses and methods to allow for managing resources for a system using voice recognition.
Background
Speech recognition and speech to text engines such as are available from Microsoft, Inc., are becoming ubiquitous for the generation of text from user audio. The text may be used to generate word documents, such as, for example, this patent application, or populate fields in a user interface, database, or the like. Conventionally, the speech recognition systems are machine specific. The machine includes the language model, speech recognition engine, and user profile for the user (or users) of the machine. These conventional speech recognition engines may be considered thick or fat clients where a bulk of the processing is accomplished on the local machine.
More recently, companies such as nVoq located in Boulder, Colo. have developed technology to provide a distributed speech recognition system using the Cloud. In these cases, the audio file of the user is streamed or batched to a remote processor from a local device. The remote processor performs the conversion (speech to text or text to speech) and returns the converted file to the user. For example, a user at a desktop computer may produce an audio file that is sent to a text to speech device that returns a Word document to the desktop. In another example, a user on a mobile device may transmit a text message to a speech to text device that returns an audio file that is played through the speakers on the mobile device.
While dictation to generate text for documents, a clipboard, or fields in a database are reasonably common, the use of audio to command a computer to take particular actions, such as, for example, invoking or launching an application, navigating between windows, hyperlinking or viewing URLs and the like is less common. Currently, Microsoft, Inc.'s Windows® operating system contemplates using voice commands to naturally control applications and complete tasks. Using voice, a user can speak commands and have the computer take actions to facilitate operation.
However, it has been found that many applications of speech recognition have a difficult time distinguishing commands from dictation. The inability for the machine to clearly delineate between dictation to transcribe and commands to take action leads to frustration on the part of the user and decreased use of a powerful tool.
Moreover, as speech recognition becomes more commonplace, clients will use speech recognition in multiple settings, such as, for example, job related tasks, personal tasks, or the like. As can be appreciated, the language models used for the various tasks may be different. Even in a job setting, the language model for various tasks may vary drastically. For example, a client may transcribe documents for medical specialties such as cardiovascular surgery and metabolic disorders. The language model, shortcuts, and user profiles for the vastly different, but related, transcriptions require the client to have different language models to effectively use speech recognition. Conventionally, to have access to different language models, a client would need a completely separate account and identification. Moreover, commands to change language models are difficult to convey in conventional computing systems as explained above.
Thus, against this background, it is desirable to develop improved apparatuses and methods for managing resources for a system using voice recognition.