1. Technical Field
This invention relates to the field of computer speech recognition and more particularly to a method and system for executing voice commands having ordinary dictation as a parameter.
2. Description of the Related Art
Speech recognition is the process by which an acoustic signal received by microphone is converted into a set of words by a computer. These recognized words may then be used in a variety of computer software applications. For example, speech recognition may be used to input data, prepare documents and control the operation of system and application software.
Speech recognition systems can recognize and insert dictated text in a variety of software applications. For example, one can use a speech system to dictate a letter into a word processing document. Simply stated, a speech recognition engine receives the user's dictated words in the form of speech signals, which it processes using known algorithms. The processed signals are then “recognized” by identifying a corresponding text phrase in a vocabulary database. The text is then conveyed to an active software application, where it is displayed. This type of spoken utterance is considered to be ordinary dictation because it is merely transcribed and does not execute a control command.
As mentioned, the speech recognition system may also be used to control the operation of voice-enabled system and application software. Typically, the software is controlled by a user issuing voice commands for performing system or application events. There are two broad categories of speech recognition systems for executing voice commands: natural language understanding (NLU) systems and finite state grammar systems. NLU systems permit total linguistic flexibility in command expression by recognizing as commands, spoken phrases in terms naturally intuitive to the speaker. For example, an NLU system is likely to recognize the spoken utterance, “would you be a dear and open the Pensky file for me?”, as instructing the system to execute a “file open” command for the file named “Pensky”. However, NLU systems are extremely complex, and at this point, operate only on very sophisticated computers.
Consequently, the vast majority of commercial speech recognition systems are finite grammar systems. In a simple finite grammar system, the user in the above example would utter a much more structured phrase, such as, “open file Pensky”. Upon receiving the speech signals corresponding to the spoken phrase, the speech recognition engine processes the signals to determine whether they correspond to a command coded within one or more command sets or grammars. If so, the command is processed and executed by the software so as to perform the corresponding event, in this case, opening the “Pensky” file.
The simplest command grammars correlate each command or function that the system can perform to one speech command. More advanced finite state grammar systems allow for increased linguistic flexibility by including alternative commands for performing each function, so that a speaker can utter any one of a number of expressions to perform the event. Typically, these systems convert spoken phrases into one of a finite set of functional expressions using translation rules or by parsing annotation in the grammar. These systems, despite having a finite grammar system, enable a user to speak more naturally when issuing voice commands.
As stated, existing speech recognition systems are capable of receiving speech signals from a user and either recognizing the signals as ordinary dictation or as a voice command for performing an event. However, typical speech systems are unable to recognize voice commands that include ordinary dictation so as to execute a command having dictation as a parameter.
One example of such a voice command is, “send a note to Bill regarding today's meeting”, which is intended to call up an E-mail application that will send a message to a colleague named “Bill” with “today's meeting” displayed in the message subject text field. Typical speech systems are likely to interpret this statement as ordinary dictation, transcribing the entire spoken phrase as text in a document, despite the fact that it includes elements of both a command and ordinary dictation. Alternatively, the statement may be recognized only as a command to execute the E-mail application, without inserting the dictation “today's meeting” in the subject line.
A basic reason existing speech systems have difficulty with these types of mixed voice commands is that the command grammars contain only a finite number of command patterns. It is impractical, if not impossible, to code into a command grammar the tens of thousands of words or word combinations in a given language. Thus, typical systems limit the grammar sets to contain phrases indicating functions relevant to performing computer software events. These functional phrases comprise a much smaller sub-set of an entire language, yet are extremely useful in carrying out software application events. Because the vast majority of phrases used in ordinary dictation are left out of the command grammars, typical finite speech systems are unable to incorporate the dictation portion in commands.
Accordingly, there is a need to provide a finite grammar speech recognition system able to execute voice commands having ordinary dictation as a parameter.