Speech recognition systems are becoming more prevalent, due to improved techniques combined with a great need for such systems. Speech recognition systems (SRS) and Applications (SRAs) are used in a wide range of applications including free speech entry (Continuous Speech Recognition) into word processing systems, speech selected items for limited choice entry categories, such as form completion, and verbal commands for controlling systems.
In the area of verbal commands for controlling systems, a goal is to allow near normal human speech to be comprehendible by a computer system. This field is referred to as Natural Language Processing (NLP), an area where humans excel but is incredibly difficult to define in precise mathematical terms needed for computational processing.
In free speech entry systems, such as word entry into a word processing program, a person speaks, and the SRS system inserts the words into the word processing program. It is not necessary for the SRS to understand the meaning of the words, only to recognize the words. For verbal commands, the system must understand natural language well enough to properly interpret the commands.
One problem with natural language is the use of pronouns such as "this" "that" and "it". In normal English (and most other languages), pronouns serve several purposes. One purpose is as a determiner, as in the sentence "Hand me that piano." Here the pronoun "that" defines which piano to hand over. Another purpose is a pronoun which refers to a topic discussed earlier, For example, in "My dog brought me a dead squirrel. I buried it in my neighbor's yard.", it's understood that the pronoun "it" refers to the squirrel, not the dog.
However, there is often ambiguity which is not easily solved such as in "My neighbor saw my dog digging up the dead squirrel. He was very excited.", it is unclear whether "he" refers to the neighbor or to the dog. If the listener has more information, such as real-world knowledge or context information, the listener may be able to determine that the dog was excited, not the neighbor
In computer applications which allow users to manipulate data and objects through voice or typed natural language commands, the use of pronouns becomes very problematic. Determining what object or data the pronoun "it", "that" or "this" refers to is exceptionally difficult. The computer application usually has no real-world knowledge of the domain for objects and data. Context information might be helpful, however, if it is counter-intuitive to what a user expects, then the computer application will not function in an intuitive ways which the user is expecting.
Context information is helpful to bridge the gap between a low-level computer application, and a context the user views information in. For example, in a word processing operation, the word processor is displaying information on a computer screen in terms of binary coded symbols with different font display characteristics and formatting information, while the user is thinking in terms of words, sentences and paragraphs.
For example, if a user is manipulating text on a computer display device (such as a computer screen), the user may wish to cause a word to be underlined. The user can issue the command "Underline this word." If the computer application has an understanding of what words are, and to which word the user means (such as the word at the current insertion point), the computer application can proceed to select the word, and then underline it.
However, if the user then states "Move up one line.", causing the current insertion point to be in or proximate another word, and then states "bold that", the computer application has no comprehension as to whether the user means the previously selected and underlined word, or the new word at the current insertion point.
The problem is even more confusing in that there may be several objects or data which could equally be applicable to the command the user is requesting. For example, if the user states "bold that", the command is equally applicable to an entire document of text, or a paragraph or sentence as well as a single word. If the display is of a set of graphical objects, such as a CAD drawing, the display objects can be one object or the entire set of displayed objects.
If the computer application attempts to comprehend pronoun usage, any ambiguity would either result in misunderstanding (and improperly executed commands), or require requesting clarification from the user (asking the user "Do you mean bold the word, the sentence or the paragraph?). Requesting clarification would result in slow progress, because each subsequent pronoun usage is different, and the user would need to be queried for each usage. The user would soon give up on trying to use pronouns to manipulate objects. Such a result defeats the whole purpose of trying to make speech recognition and natural language processing an easy system for people to use.