The present invention relates generally to electronic devices with speech recognition technology. More particularly, the present invention relates to portable communication devices having voice input and control capabilities.
As the demand for smaller, more portable electronic devices grows, consumers want additional features that enhance and expand the use of portable electronic devices. These electronic devices include compact disc players, two-way radios, cellular telephones, computers, personal organizers, and similar devices. In particular, consumers want to input information and control the electronic device using voice communication alone. It is understood that voice communication includes speech, acoustic, and other non-contact communication. With voice input and control, a user may operate the electronic device without touching the device and may input information and control commands at a faster rate than a keypad. Moreover, voice-input-and-control devices eliminate the need for a keypad and other direct-contact input, thus permitting even smaller electronic devices.
Voice-input-and-control devices require proper operation of the underlying speech recognition technology. If the limitations of speech recognition technology are not observed, then the electronic device will not perform satisfactorily. Basically, speech recognition technology analyzes a speech waveform within a speech data acquisition window for matching the waveform to a particular word or command. If a match is found, then the speech recognition technology provides a signal to the electronic device identifying the particular word or command.
For speech recognition technology to provide suitable results, a user must speak at a reasonable volume within the data acquisition window. Although the speech recognition technology may operate correctly, the results from its use are dependent upon the actual speech waveform acquired in the speech data acquisition window. Consequently, speech recognition technology does not work well or at all when: (1) the user speaks over the start of the speech acquisition window; (2) the user speaks over the end of the speech acquisition window; (3) the user speaks too loudly; (4) the user speaks too softly; (5) the user does not say anything; (6) additional noise is present including impulsive, tonal, or wind noise; and (7) similar situations where the acquired speech waveform is not the complete waveform spoken by the user. Moreover, speech recognition technology may recognize an xe2x80x9cincompletexe2x80x9d waveform as another word. In this situation, the speech recognition technology would signal the wrong word or command to the electronic device.
The prior art does not thoroughly screen the acquired speech input for proper speech signal format prior to processing by the speech recognition technology. Some references describe using a meter or light to indicate acquired signal amplitude levels. However, these amplitude levels cover only the xe2x80x9cloudnessxe2x80x9d of the acquired speech waveform. Moreover, this type of xe2x80x9cloudnessxe2x80x9d indication includes both the user""s speech and noise. When the noise is louder than the user""s speech, these indicators would show erroneously that the user is speaking at a proper volume. Furthermore, the prior art does not test the signal to determine whether the user spoke too soon, too late, or too quietly. The impact of signal truncation or inadequate signal to noise ratio is not considered. As a result, the prior art uses acquired speech xe2x80x9cas isxe2x80x9d with little or no feedback to the user regarding how to improve the speech input format.
Accordingly, there is a need to thoroughly screen the speech input into a voice-input-and-control device for proper speech format prior to processing in the speech recognition technology. There also is a need to provide feedback instructing the user how to improve the speech input for optimizing the speech recognition of the electronic device.
The primary object of the present invention is to provide a communication device and method for screening speech signals for proper formatting prior to speech recognition processing. Another object of the present invention is to inform the user of errors associated with the speech signal format. Another object of the present invention is to provide the user with instructions for correcting errors associated with the speech signal format. This corrective feedback helps the user minimize future unsuitable speech input and improves the overall recognition accuracy and user satisfaction. As discussed in greater detail below, the present invention overcomes the limitations of the existing art to achieve these objects and other benefits.
The present invention provides a communication device capable of screening speech signals prior to speech recognition processing. The communication device includes a microprocessor connected to communication interface circuitry, audio circuitry, memory, an optional keypad, a display, and a vibrator/buzzer. The audio circuitry is connected to a microphone and a speaker. The audio circuitry includes filtering and amplifying circuitry and an analog-to-digital converter. The microprocessor includes a speech/noise classifier and speech recognition technology.
The microprocessor analyzes a speech signal to determine speech waveform parameters within a speech acquisition window. The speech waveform parameters include speech energy, noise energy, start energy, end energy, the percentage of clipped speech samples, and other speech or signal related parameters within the speech acquisition window.
By comparing speech waveform parameters with threshold values, the microprocessor determines whether an error exists in the signal format of the speech signal. The microprocessor provides error information to the user when an error exists in the signal format. The microprocessor may deactivate or halt the speech recognition processing so the user may correct the error in the speech signal format. Alternatively, the microprocessor may permit the speech recognition processing to continue with a warning that the speech recognition output may be incorrect due to the error in the speech signal format.