1.1 Field of the Invention
The present invention relates to systems and methods for controlling computer applications and/or processes using voice input. More precisely, the present invention relates to integrating a plurality of applications and/or processes into a common user interface which is controlled mostly by voice activated commands, which allows hands-free control of each process within a common environment.
1.2 Discussion of Prior Art
Speech input user interfaces are well known. This specification expressly incorporates by reference U.S. Pat. No. 6,606,599 and U.S. Pat. No. 6,208,972, which provide a method for integrating computing processes with an interface controlled by voice actuated grammars.
Typical speech driven software technology has traditionally been useful for little more than a dictation system which types what is spoken on a computer display, and has limited command and control capability. Although many applications have attempted to initiate command sequences, this may involve an extensive training session to teach the computer how to handle specific words. Since those words are not maintained in a context based model that simulates intelligence, it is easy to confuse such speech command systems and cause them to malfunction. In addition, the systems are limited in capability to the few applications that support the speech interface.
It is conventionally known that an application window can spawn another window when the application calls for specific user input. When that happens, we call the first window a “parent window”, and the spawned window a “child window”. This presents certain problems in that the child window generally overlaps its parent window.
Some child windows have to be satiated or terminated before releasing control (active focus) and returning I/O access back to the main application window. Examples of Child Windows are i) a Document window in an application like Word, ii) another foreground, monopolizing (aka Modal) window like File Open, iii) another foreground, non-monopolizing (aka Non-Modal) window.
Every speech-initiated application maintains its own operating window as a “child window” of the system. The child/parent window scheme does not allow for complex command processing. A complex command may require more than one application to be put to contribution in a specific order based on a single spoken command phrase. For example, the spoken command phrase “add Bob to address book” is a multiple-step/multiple-application command. The appropriate commands required by the prior art are: “open address book”, “new entry” and “name Bob”. In the prior art, each operation is required to be completed one by one in a sequential order. Although this methodology works to a minimum satisfaction level, it does not use natural language speech. The prior art is typically not capable of performing multiple step operations with a single spoken command phrase. In addition, the prior art does not enable a single spoken phrase to process commands that require the application to perform multiple steps without first training the application on the sequence of steps that the command must invoke (much like programming a macro). For example, the spoken command phrase “Write a letter to Bob” requires multiple applications to be used sequentially, and if those applications are not running, they must be launched in order to execute the command. The prior art would typically have the user say: “open address book”, “select Bob”, “copy address”, “open editor”, “new letter” and “paste address”—or would require the user to train the application to perform these steps every time it hears this command. The address book and text editor/word processor are generally different applications. Since these programs require the data to be organized in a specific order, the voice commands must be performed in a specific order to achieve the desired result. The prior art is not capable of performing operations across multiple applications entirely on its own with a single spoken command phrase.
In each Windowed Operating System it is common for each executing application window to “pop-up” a new “child window” when a secondary type of interaction is required by the user. When an application is executing a request, focus (an active attention within its window) is granted to it. Windowed operating systems running on personal computers are generally limited to a single active focus to a single window at any given time.
Current computer technology allows application programs to execute their procedures within individual application oriented graphical user interfaces (i.e. “windows”). Each application window program is encapsulated in such a manner that most services available to the user are generally contained within the window. Thus each window is an entity unto itself.
When an application window requires I/O, such as a keyboard input, mouse input or the like, the operating system passes the input data to the application.
Typical computer technologies are not well suited for use with a speech driven interface. The use of parent and child windows creates a multitude of problems since natural language modeling is best suited for complex command processing. Child windows receive active focus as a single window, and because they are sequentially activated by the operating system (single action), and as stated above, prior art speech command applications are not suited for natural language processing of complex commands.