Speech recognition systems are becoming more prevalent, due to improved techniques combined with a great need for such systems. Speech recognition systems (SRS) and Applications (SRAs) are used in a wide range of applications including free speech entry (Continuous Speech Recognition) into word processing systems, speech selected items for limited choice entry categories, such as form completion, and verbal commands for controlling systems.
In the area of verbal commands for controlling systems, a goal is to allow near-normal human speech to be comprehendible by a computer system. This field is referred to as Natural Language Processing (NLP), an area where humans excel but that is incredibly difficult to define in precise mathematical terms needed for computational processing.
In free speech entry systems, such as word entry into a word processing program, a person speaks, and the SRS system inserts the words into the word processing program. The person watches the words being entered by a visual system, such as a computer monitor. There is direct feedback to the user, who can see her thoughts recorded, and make corrections should the SRS system misunderstand a word or phrase. Compared to a person using a tape recorder to later have a stenographer transcribe it, this SRS has many advantages.
This direct feedback loop is even more advantageous since the person can also edit the text entered into the word processor. Writing is an indefinite process, often requiring changes and restructuring. Editing and redrafting is an integral part of writing. If a person is entering text into a word processor using a SRS system, it is a natural extension for the person to be able to edit and modify the text using voice commands, instead of having to resort to keyboard entry or pointer devices. Therefore, an SRS system for text entry would preferably have at least two different modes, one of free speech entry, and one of user command interpretations. These modes are very different processes, but their combination has great utility.
There are a great variety of word processing programs available which run on general purpose computers such as personal computers (PCs). There are also several SRA (Speech Recognition Applications) available, some of which allow a user to enter text into a word processing application. The word processing application is separate from the SRA. The word processing application normally accepts text from a keyboard, though the text entry can take other forms. The SRA acts as a "front end" to the user's word processing application.
As previously described, adding text into a word processing application and allowing speech command editing are two different concepts. To allow editing, an SRA must be able to interpret user commands, and instruct the separate word processing application to perform those commands. Interpreting user commands represents a vast range of problems, from the difficult task of NLP (Natural Language Processing), to the problem of properly instructing a variety of different user applications.
A NLP system for controlling a word processing application will usually have a limited vocabulary recognition determined by the available commands for editing and formatting text. The NLP system must be able to interpret the variety of commands and instruct the word processing application to perform accordingly. The set of possible commands can be very large. As an example, some commands limited to VERB-NOUN pairs (in action-object paradigms) include "delete character", "delete word", "delete line", "delete sentence", etc. With a huge number of possible noun objects, a mapping of all possible verb actions (examples: "delete" "italicize", "underline", "bold", "move" etc) is enormous. Further, any additions in the form of new commands will create a huge number of new VERB-NOUN pairs.
Another problem is that NLP is often error prone. Many SRAs often rely on educated guesses as to the individual words the user said. A typical SRA has no thematic or semantic sense of natural language. It only attempts to identify words based on analysis of the input sound sampling. This leads to several possible interpretations of what the user requested. The NLP application has the daunting task of attempting to interpret several possible commands and selecting the correct interpretation. Computer processing time is wasted on improper determinations, resulting in overall slow application speed. Further, an NLP application often cannot even determine an incorrect determination.
Some systems allowing user commands attempt to avoid these problems by using "fill in the blank" templates. The user is prompted with a template to complete, by first stating a verb, and then stating an object. The choice of possible entries into each slot of the template is severely limited. The user can only enter a limited selection of verbs or nouns.
This approach severely limits the power of an NLP system. This template approach is slow and user intensive. Also, modifiers are not allowed, so a user cannot say "delete two words". The user must issue two "delete word" commands. The goal of making application command interpretation an easy and intuitive task becomes lost.
Accordingly, what is needed is a NLP system which can accurately interpret a wide range of user commands, with easy extensibility. The word vocabulary and command forms must be easy to extend, without affecting the present vocabulary. Further, improper command phrases should be detected as quickly as possible to avoid spending computer time processing such phrases. The system should also provide users with informative error messages when command phrases are improper. The NLP application must be immune from infinite loops occurring while processing commands.
The NLP command interpreting application must be modular enough so that adapting it to command different applications is simple. For example, the NLP application should require minimal changes to allow commanding of different word processing applications, each with a completely different programming or macro language. Adapting the NLP application to other application domains, including mail systems, spreadsheet programs, database systems, games and communication systems should be simple.
Along with the NLP command interpreter being adaptable among different applications at the back end, it should also be adaptable at the front end, for different languages such as English or French, or to allow for other variations in speech or dialect.