Field of the Invention
This invention relates, in one embodiment, generally to speech recognition systems, and more particularly to using semantic inference with speech recognition systems.
A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever. The following notice applies to the software and data as described below and in the drawings hereto: Copyright (copyright) 2000, Apple Computer, Inc., All Rights Reserved.
Speech recognition enables a computer system to receive voice inputs and convert them into text. The computer system receives an audio input, transforms the audio input into digital data, compares the digital data with a list of digitized waveforms corresponding to text, and converts the digital data into the text corresponding to the most closely matched digitized waveform. One application of speech recognition is voice command and control (VCC), which enables a computer user to control a computer by voice rather than by using traditional user interfaces such as a keyboard and a mouse. Advances in speech recognition technology have enhanced the performance of VCC so that a computer can accurately perform a task by recognizing a command spoken within a restricted domain of vocabulary. However, existing VCC technology has limitations that diminish the usefulness of the technology to an average computer user.
Typical VCC applications employ a context-free grammar, such as a finite state grammar, that is a compact way of representing an exhaustive list of each and every command that the application can recognize. A finite state grammar is a particular implementation of a context-free grammar. These applications compare the spoken command to the list of commands underlying the context-free grammar. Previously existing VCC applications that use a context-free grammar either reject or incorrectly recognize any utterance that is semantically accurate but syntactically out-of-grammar. This rigid framework requires the computer user to learn and memorize the specific commands that are compiled within the context-free grammar.
Semantic inference alleviates the problems associated with VCC applications that use a context-free grammar. Semantic inference is a more tolerant approach to language modeling that enables a computer to recognize commands that are out-of-grammar but semantically accurate, thereby allowing computer users to say what they mean rather than requiring them to speak from an established list of commands. Existing semantic inference systems replace a context-free grammar in a speech recognition unit with a statistical language model such as an n-gram. This substitution prevents the speech recognition unit from rejecting out-of-grammar voice inputs before the semantic classification engine has the opportunity to evaluate the voice input for semantic similarity. A statistical language model makes it possible for the speech recognition unit to transcribe, with a reasonably low error rate, whatever formulation the computer user chooses for expressing a command. A semantic classification engine then operates on the transcription to determine the desired action.
Using a statistical language model with the speech recognition unit enables the voice command and control system to accurately identify the correct command. However, there are problems associated with semantic inference systems that employ a statistical language model. Substituting a statistical language model for a context-free grammar in the speech recognition unit requires a significant change in the overall architecture of the speech recognition unit, specifically in the structure of the search module. Also, estimating the parameters of the statistical language model typically requires multiple iterations over a large training corpus of relevant text data, which may involve a large number of central processor unit (CPU) cycles. Additionally, developing and maintaining such a large corpus of text data is time-consuming and expensive. Furthermore, a speech recognition unit using a statistical language model typically requires the computer user to wear a head-mounted noise-canceling microphone and to train the system to his or her voice. Finally, n-gram statistical language models have significantly larger storage requirements than context-free grammars and lead to greater recognition runtimes.
Therefore, a method and apparatus to use semantic inference with a speech recognition system using a context-free grammar are required.
A method and apparatus to use semantic inference with speech recognition systems using a context-free grammar is described herein. According to one aspect of the invention, a method for speech recognition comprises recognizing at least one spoken word, processing the spoken word using a context-free grammar, deriving an output from the context-free grammar, and translating the output into a predetermined command.
According to one aspect of the present invention, a machine-readable medium has stored thereon a plurality of instructions that, when executed by a processor, cause the processor to recognize at least one spoken word, process the spoken word using a context-free grammar, derive an output from said context-free grammar, and translate the output into a predetermined command.
According to one aspect of the present invention, an apparatus for speech recognition includes a processing unit, a memory unit, a system bus, and at least one machine-readable medium. A speech recognition unit, a context-free grammar, and a semantic inference engine are stored in the machine-readable medium.