1. Field of the Invention
The present invention relates generally to pattern recognition. More particularly, this invention relates to speech recognition systems that recognize commands using semantic inference and word agglomeration.
2. Copyright Notice/Permission
A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. The following notice applies to the software and data as described below and in the drawings hereto: Copyright © 2000, Apple Computer, Inc., All Rights Reserved.
3. Background
As computer systems have evolved, the desire to use such systems for pattern recognition has grown. Typically, the goal of pattern recognition systems is to quickly provide accurate recognition of input patterns. One type of pattern recognition system is a voice recognition system, which attempts to accurately identify a user's speech. Another type of pattern recognition is a handwriting recognition system. A speech recognizer discriminates among acoustically similar segments of speech to recognize words, while a handwriting recognizer discriminates among strokes of a pen to recognize words.
An application of speech recognition is voice command and control (VCC), which enables a computer user to control a computer by voice rather than by using traditional user interfaces such as a keyboard and a mouse. Advances in speech recognition technology have enhanced the performance of VCC so that a computer can accurately perform a task by recognizing a command spoken within a restricted domain of vocabulary. However, existing VCC technology has limitations that diminish the usefulness of the technology to an average computer user.
Typical VCC applications employ a context-free grammar, such as a finite state grammar, that is a compact way of representing an exhaustive list of each and every command that the application can recognize. A finite state grammar is a particular implementation of a context-free grammar. These applications compare the spoken command to the list of commands underlying the context-free grammar. As a result, existing VCC applications that use a context-free grammar either reject or incorrectly recognize any utterance that is semantically accurate but syntactically out-of-grammar. This rigid framework requires the computer user to learn and memorize the specific commands that are compiled within the context-free grammar.
Semantic inference alleviates the problems associated with VCC applications that use a context-free grammar. Semantic inference is a more tolerant approach to language modeling that enables a computer to recognize commands that are out-of-grammar but semantically accurate, thereby allowing computer users to say what they mean rather than requiring them to speak from a pre-defined list of commands. For example, semantic inference will enable a user to say “make a new spreadsheet” when the pre-defined wording of the command is “Open Microsoft Excel.”
VCC applications that use semantic inference typically employ a speech recognition unit to provide a transcription of the user's spoken command. A semantic classification engine applies semantic inference to the transcription to determine the desired action. Some VCC applications using semantic inference replace the context-free grammar in a speech recognition unit with a statistical language model such as an n-gram. A statistical language model makes it possible for the speech recognition unit to transcribe, with a reasonably low error rate, whatever formulation the computer user chooses for expressing a command. This substitution prevents the speech recognition unit from rejecting out-of-grammar voice inputs before the semantic classification engine has the opportunity to evaluate the voice input for semantic similarity.
However, regardless of whether the VCC application uses a context-free grammar or a statistical language model, current implementations of semantic inference operate only at the word level. This is because the latent semantic analysis that comprises the process of semantic inference is an instance of the so-called “bag-of-words” model, which pays no attention to the order of words in the command. As a result, commands containing the same words in a different order are erroneously mapped to the same representation. For example, the commands “Change icons to list” and “Change list to icons” are indistinguishable, even though the underlying commands are very different. Thus, while latent semantic analysis is well-suited to capture large-span (i.e., semantic) relationships between words, it is inherently unable to capitalize on the local (i.e., syntactic or pragmatic) constraints present in the language. To avoid the erroneous mapping of these types of commands to the same representation requires additional processing, such as performing back-off for sense disambiguation. The additional processing is undesirable since it consumes additional central processor unit (CPU) cycles and degrades performance. What is needed, therefore, is an improved method and apparatus for using semantic inference in a speech recognition system to more accurately recognize a command.