I. Field of the Invention
The aspects of the present invention relate to a speech recognition network computer user interface. More specifically, the embodiments of the present invention relate to a novel method and system for user interaction with a computer using speech recognition and natural language processing.
II. Description of the Related Art
As computers have become more prevalent it has become clear that many people have great difficulty understanding and communicating with computers. A user must often learn archaic commands and non-intuitive procedures in order to operate the computer. For example, most personal computers use windows-based operating systems which are largely menu-driven. This requires that the user learn what menu commands or sequence of commands produce the desired results.
Furthermore, traditional interaction with a computer is often slowed by manual input devices such as keyboards or mice. Many computer users are not fast typists. As a result, much time is spent communicating commands and words to the computer through these manual input devices. It is becoming clear that an easier, faster and more intuitive method of communicating with computers and networked objects, such as web-sites, is needed.
One proposed method of computer interaction is speech recognition. Speech recognition involves software and hardware that act together to audibly detect human speech and translate the detected speech into a string of words. As is known in the art, speech recognition works by breaking down sounds the hardware detects into smaller non-divisible sounds called phonemes. Phonemes are distinct units of sound. For example, the word xe2x80x9cthosexe2x80x9d is made up of three phonemes; the first is the xe2x80x9cthxe2x80x9d sound, the second is the xe2x80x9coxe2x80x9d sound, and the third is the xe2x80x9csxe2x80x9d sound. The speech recognition software attempts to match the detected phonemes with known words from a stored dictionary. An example of a speech recognition system is given in U.S. Pat. No. 4,783,803, entitled xe2x80x9cSPEECH RECOGNITION APPARATUS AND METHODxe2x80x9d, issued Nov. 8, 1998, assigned to Dragon Systems, Inc. Presently, there are many commercially available speech recognition software packages available from such companies as Dragon Systems, Inc. and International Business Machines, Inc.
One limitation of these speech recognition software packages or systems is that they typically only perform command and control or dictation functions. Thus, the user is still required to learn a vocabulary of commands in order to operate the computer.
A proposed enhancement to these speech recognition systems is to process the detected words using a natural language processing system. Natural language processing generally involves determining a conceptual xe2x80x9cmeaningxe2x80x9d (e.g., what meaning the speaker intended to convey) of the detected words by analyzing their grammatical relationship and relative context. For example, U.S. Pat. No. 4,887,212, entitled xe2x80x9cPARSER FOR NATURAL LANGUAGE TEXTxe2x80x9d, issued Dec. 12, 1989, assigned to International Business Machines Corporation teaches a method of parsing an input stream of words by using word isolation, morphological analysis, dictionary look-up and grammar analysis.
Natural language processing used in concert with speech recognition provides a powerful tool for operating a computer using spoken words rather than manual input such as a keyboard or mouse. However, one drawback of a conventional natural language processing system is that it may fail to determine the correct xe2x80x9cmeaningxe2x80x9d of the words detected by the speech recognition system. In such a case, the user is typically required to recompose or restate the phrase, with the hope that the natural language processing system will determine the correct xe2x80x9cmeaningxe2x80x9d on subsequent attempts. Clearly, this may lead to substantial delays as the user is required to restate the entire sentence or command. Another drawback of conventional systems is that the processing time required for the speech recognition can be prohibitively long. This is primarily due to the finite speed of the processing resources as compared with the large amount of information to be processed. For example, in many conventional speech recognition programs, the time required to recognize the utterance is long due to the size of the dictionary file being searched.
An additional drawback of conventional speech recognition and natural language processing systems is that they are not interactive, and thus are unable to cope with new situations. When a computer system encounters unknown or new networked objects, new relationships between the computer and the objects are formed. Conventional speech recognition and natural language processing systems are unable to cope with the situations that result from the new relationships posed by previously unknown networked objects. As a result, a conversational-style interaction with the computer is not possible. The user is required to communicate complete concepts to the computer. The user is not able to speak in sentence fragments because the meaning of these sentence fragments (which is dependent on the meaning of previous utterances) summary.
What is needed is an interactive user interface for a computer which utilizes speech recognition and natural language processing which avoids the drawbacks mentioned above.
The embodiments of the present invention include a novel and improved system and method for interacting with a computer using utterances, speech processing and natural language processing. Generically, the system comprises a speech processor for searching a first grammar file for a matching phrase for the utterance, and for searching a second grammar file for the matching phrase if the matching phrase is not found in the first grammar file. The system also includes a natural language processor for searching a database for a matching entry for the matching phrase; and an application interface for performing an action associated with the matching entry if the matching entry is found in the database.
In one embodiment, the natural language processor updates at least one of the database, the first grammar file and the second grammar file with the matching phrase if the matching entry is not found in the database.
The first grammar file is a context-specific grammar file. A context-specific grammar file is one which contains words and phrases that are highly relevant to a specific subject. The second grammar file is a general grammar file. A general grammar file is one which contains words and phrases which do not need to be interpreted in light of a context. That is to say, the words and phrases in the general grammar file do not belong to any parent context. By searching the context-specific grammar file before searching the general grammar file, the present invention allows the user to communicate with the computer using a more conversational style, wherein the words spoken, if found in the context specific grammar file, are interpreted in light of the subject matter most recently discussed.
In a further aspect, the speech processor searches a dictation grammar for the matching phrase if the matching phrase is not found in the general grammar file. The dictation grammar is a large vocabulary of general words and phrases. By searching the context-specific and general grammars first, it is expected that the speech recognition time will be greatly reduced due to the context-specific and general grammars being physically smaller files than the dictation grammar.
In another aspect, the natural language processor replaces at least one word in the matching phrase prior to searching the database. This may be accomplished by a variable replacer in the natural language processor for substituting a wildcard for the at least one word in the matching phrase. By substituting wildcards for certain words (called xe2x80x9cword-variablesxe2x80x9d) in the phrase, the number of entries in the database can be significantly reduced. Additionally, a pronoun substituter in the natural language processor may substitute a proper name for pronouns the matching phrase, allowing user-specific facts to be stored in the database.
Additionally, a pronoun substituter in the natural language processor may substitute a proper name for pronouns the matching phrase, allowing user-specific facts to be stored in the database.
In another aspect, a string formatter text formats the matching phrase prior to searching the database. Also, a word weighter weights individual words in the matching phrase according to a relative significance of the individual words prior to searching the database. These acts allow for faster, more accurate searching of the database.
A search engine in the natural language processor generates a confidence value for the matching entry. The natural language processor compares the confidence value with a threshold value. A boolean tester determines whether a required number of words from the matching phrase are present in the matching entry. This boolean testing serves as a verification of the results returned by the search engine.
In order to clear up ambiguities, the natural language processor prompts the user whether the matching entry is a correct interpretation of the utterance if the required number of words from the matching phrase are not present in the matching entry. The natural language processor also prompts the user for additional information if the matching entry is not a correct interpretation of the utterance. At least one of the database, the first grammar file and the second grammar file are updated with the additional information. In this way, the present invention adaptively xe2x80x9clearnsxe2x80x9d the meaning of additional utterances, thereby enhancing the efficiency of the user interface.
The speech processor will enable and search a context-specific grammar associated with the matching entry for a subsequent matching phrase for a subsequent utterance. This ensures that the most relevant words and phrases will be searched first, thereby decreasing speech recognition times.
Generically, the embodiments of the invention include a method to update a computer for voice interaction with a network object, such as a web page. Initially, a network object table, which associates with the network object with the voice interaction system, is transferred to the computer over a network. The location of the network object table can be imbedded within the network object, at a specific internet web-site, or at consolidated location that stores network object tables for multiple network objects. The network object table is searched for an entry matching the network object. The entry matching the network object may result in an action being performed, such as text speech being voiced through a speaker, a context-specific grammar file being used, or a natural language processor database being used. The network object table may be part of a dialog definition file. Dialog definition files may also include a context-specific grammar, entries for a natural language processor database, a context-specific dictation model, or both.
In another aspect of the present invention, a network interface transfers a dialog definition file from over the network. The dialog definition file contains a network object table. A data processor searches the network object table for a table entry that matches the network object. Once this matching table entry is found, an application interface performs an action specified by the matching entry.
In another aspect of the present invention, the dialog definition file associated with a network is located, and then read. The dialog definition file could be read from a variety of locations, such as a web-site, storage media, or a location that stores dialog definition files for multiple network objects. A network object table, contained within the dialog definition file, is searched to find a table entry matching the network object. The matching entry defines an action associated with the network object, and the action is then performed by the system. In addition to a network object table, the dialog definition file may contain a context-specific grammar, entries for a natural language processor database or both.