All computer speech processing systems have to establish a match between the sound of an utterance (or a portion thereof) and an entry in the system's dictionary. A dictionary entry may be a sound or a phoneme (e.g., “v”), a syllable (e.g., “-ver-”), a word (e.g., “version”), or a phrase (“create a version”).
Computer speech processing systems generally fall into two categories: dictation systems and command systems. Dictation systems (e.g., IBM ViaVoice and Dragon Systems Naturally Speaking) usually work in conjunction with a word processing program to allow a user to dictate text into an electronic document. Command systems (e.g., Apple Speech Recognition under MacOS) map speech to computer commands.
Computer dictation systems are designed to break an utterance into a sequence of entries in a dictionary. Such systems identify known phrases and words in the speech and try to handle the unfamiliar words by guessing their spelling or asking the user for additional input. If a pronounced word is not in the dictionary, there is no guarantee that the dictation system will spell it correctly (unless the user spells it explicitly, thus largely defeating the purpose of using the dictation system). For this reason, the dictation systems benefit from and are optimized for very large dictionaries.
Computer command systems are designed to recognize phrases representing the commands the computer can perform. A computer command system would match the sound of user saying “Save as Vitaliy's application in directory Fain documents” with the word processor “Save As” command which requires certain parameters, then do its best spelling with “Vitaliy's application,” and finally match the sound of “Fain documents” with a name in a list of directories available to the system.
Current computer speech processing systems with large active dictionaries are not designed or optimized for the task of efficiently determining whether and where a human voice utterance contains a given word or phrase. Even when they can perform this task, they perform it inefficiently. This task, however, is important in a variety of contexts, for example, in an efficient implementation of a natural language understanding system as described in co-pending U.S. patent application Ser. No. 10/043,998 titled “Method and Apparatus Providing Computer Understanding and Instructions from Natural Language” filed on Jan. 11, 2002, the entire teaching of which are incorporated herein by reference.