(Not Applicable)
(Not Applicable)
1. Technical Field
This invention relates to the field of computer speech recognition and more particularly to a method and system for accurately recognizing voice command structures to perform various events from an initial location to a new location in a finite grammar speech recognition system.
2. Description of the Related Art
Speech recognition is the process by which an acoustic signal received by microphone is converted into a set of words by a computer. These recognized words may then be used in a variety of computer software applications for purposes such as document preparation, data entry and command and control.
There are two broad categories of speech recognition systems for controlling system and application events: natural language understanding (NLU) systems and finite state grammar systems. NLU systems permit total linguistic flexibility in command expression, recognizing as commands spoken phrases in terms intuitive to the speaker. However, NLU systems are extremely complex, and at this point, operate only on very sophisticated computers.
Consequently, the vast majority of commercial speech recognition systems are finite grammar systems. Finite grammar systems perform events by recognizing spoken words corresponding to a set of prescribed voice commands and executing those commands. The simplest such command and control grammars correlate each function that the system can perform to one speech command. Thus, the speaker must learn and utter a prescribed command, such as xe2x80x9cfile openxe2x80x9d, to perform a desired event such as opening a new document in a word processor. More advanced finite state grammar systems allow for increased linguistic flexibility. Such systems may include alternative commands for performing each function, so that a speaker can utter any one of a number of recognized expressions to perform the event. Typically, these systems convert spoken phrases into one of a finite set of functional expressions using translation rules or by parsing annotation in the grammar. These systems, despite having a finite grammar system, enable a speaker to use more natural language expressions to perform desired events.
One problem with advanced natural-language-like finite grammar systems is that the user may believe it is more intelligent than it really is. For instance, by recognizing a term in one context, the user may think the term can be used with the same meaning in other situations. Since these systems have a finite grammar, if a term (or its functional equivalent) is not prescribed to execute an intended event, unexpected events may be performed, or none at all. The likelihood of error increases as the commands get more complex and natural-sounding.
One such natural language expression is a voice command structure for performing various events from one point to another. Examples of such commands include, xe2x80x9cmove text from here to herexe2x80x9d or xe2x80x9cmake text blue from here to therexe2x80x9d. Such commands instruct the computer, in a natural way, to perform an action upon an object from one point to another, with the first point being the current position of the object and the second point being a new position indicated by the user. Thus, these commands require the speech recognition system to recognize the desired action to take on an object, but wait until it receives intermediate user input regarding the new position before executing the action.
Current belief is that only sophisticated NLU systems can recognize these complex, multi-part xe2x80x9cfrom here . . . to herexe2x80x9d voice command structures. However, the cost and complexity of NLU systems make these systems impractical for use by personal consumers or businesses. And, even NLU systems have difficulty processing these commands when the new position is fixed by voice commands. This is because NLU systems are particularly well-suited for identifying one desired command from a string of spoken words. For example, typical NLU systems would recognize successive voice commands such as xe2x80x9cdown, down, downxe2x80x9d merely as xe2x80x9cdownxe2x80x9d, instead of move down three times. Thus, even a costly NLU system may not permit a user to perform an event from one point to another solely by issuing voice commands.
Accordingly, there is a need to provide a finite grammar speech recognition system able to recognize and execute voice command structures to perform events from one point to another.
The inventors of the present invention have determined that, in a finite grammar speech recognition system, coordinating the grammar with application scripting permits flexible contextual interpretation of a command structure in the grammar. Accordingly, the present invention provides a method and system to recognize and execute finite grammar voice command structures for performing various events from one point to another.
Specifically, the present invention operates on a computer system that is adapted for speech recognition using a finite state grammar, to execute voice commands for performing an event from an initial location to a new location. The method and system is executed by recognizing an enabling voice command which specifies the event to be performed from the initial location. A functional expression, defined by at least an action and an object, is determined corresponding to the enabling voice command, the action and object then being stored in a suitable memory location. An input specifying the new location is received, and an activating voice command for performing the event up to this new location is recognized. The action and object are then retrieved from the memory location, according to which the event is performed from the initial location to the new location. In a preferred embodiment of the invention, the enabling voice command includes the terms xe2x80x9cfrom herexe2x80x9d and the activating voice command is xe2x80x9cto herexe2x80x9d.
The present invention thus provides the object and advantage of using a finite grammar speech recognition system to execute xe2x80x9cfrom here . . . to herexe2x80x9d voice command structures to perform an event from one point to another. This increases the linguistic flexibility of finite grammar systems. In particular, the present invention permits a user to issue commands using very natural and intuitive terms such as xe2x80x9cfrom here . . . to herexe2x80x9d coupled with terms indicating what action to take and the object subject to the action. Additionally, this invention provides the further object and advantage of a finite grammar system capable of executing voice commands in conjunction with intermediate user inputs.
Particularly, the user can specify the new location using a suitable pointing device or by one or more voice commands. The present invention provides the additional object and advantage of performing the desired event from one point to the other solely by voice. In other words, the present invention affords hands-free curser positioning between the enabling and activating commands.
The present invention may also count recognized voice commands issued subsequent to the enabling voice command. If the counted voice commands exceed a prescribed limit, the memory location containing the action and object content is cleared. Also, the action and object content can be cleared from memory after the corresponding event has been performed. This provides the additional object and advantage of reducing the likelihood that unmatched enabling and activating voice commands are inadvertently executed.
These and other objects, advantages and aspects of the invention will become apparent from the following description. In the description, reference is made to the accompanying drawings which form a part hereof, and in which there is shown a preferred embodiment of the invention. Such embodiment does not necessarily represent the full scope of the invention and reference is made therefore, to the claims herein for interpreting the scope of the invention.