This invention relates to a method and apparatus for voice telecommunication, and more particularly to a method and apparatus in which incoming voice signals output by a speaker may be canceled from an outgoing voice signal to be used for speech recognition.
In a conventional communication system such as a land-based telephone system, speech spoken into a remote telephone is picked up by a microphone in the telephone and converted into an incoming audio analog signal (relative to the receiving telephone). The incoming audio signal is sent down an incoming line and eventually to an amplifier connected to a speaker in the receiving telephone. The amplifier amplifies the signal and the speaker converts the amplified signal into sound waves that are heard by a person at the receiving telephone. The person can respond by speaking into a microphone in the receiving telephone. The microphone is operably connected to an outgoing line and converts the words of the telephone user into an outgoing audio signal sent down an outgoing line and ultimately onward, generally to a speaker in the remote telephone.
A land based communication system, with speech recognition, typically has a far end and a near end with a remote microphone/speaker unit located at the far end and a local microphone/speaker unit located at the near end. A landline connects the remote and local microphone/speaker units. The landline has an incoming line (relative to the local microphone/speaker unit) that connects the remote microphone with the local speaker; and an outgoing line (relative to the local microphone/speaker unit) that connects the local microphone with the remote speaker. A speech recognition unit is usually operably attached to the outgoing line carrying the outgoing audio signal from the local microphone at the near end to the remote speaker at the far end. Words spoken by a person at the near end, in response to the output from the local speaker, are received by the local microphone and converted into an outgoing audio analog signal that travels along the outgoing line from the local microphone to the remote speaker at the far end. The speech recognition unit converts the outgoing audio analog signal into words. Problems can arise when the local microphone picks up words other than words spoken by the near end person. For example, speech from the local speaker might be picked up by the local microphone along with speech from the near end person, and produce a mixed outgoing audio analog signal containing speech from the near end speaker and the near end person. A speech recognition unit xe2x80x9clisteningxe2x80x9d to the outgoing microphone signal may not differentiate between the two. For example, where a remote system generates an audio command such as xe2x80x9cPlease type seven to delete messagexe2x80x9d and that command is output on the near end speaker, the words xe2x80x9csevenxe2x80x9d and xe2x80x9cdeletexe2x80x9d may well be picked up by the microphone and carried in the outgoing signal which, when received and processed by the speech recognition unit, could cause a message to be deleted even though the near end user did nothing.
An echo suppressor has been operably attached to the incoming and outgoing lines of the communication system to improve the operation of the speech recognition unit. The echo canceller is used to suppress words picked up by the microphone from the loudspeaker. Voice recognition units should only receive words spoken by the near end user and picked up by the microphone, but suppression of the loudspeaker words by means of the echo canceller can leave a residual echo in the outgoing line along with genuine outgoing signal (i.e. words spoken by the user) and result in a mixes outgoing signal. The speech recognition unit might fail to differentiate bet ween the genuine words spoken by the user and unwanted output from the speaker. In this type of scenario, the speech recognition would incorrectly attribute words from the speaker as words spoken by the near end user.
Alternatively, communication systems have been configured to disable the microphone when the loudspeaker is producing output. However, this solution does not allow for a user interrupting or xe2x80x9ccutting throughxe2x80x9d a voice prompt outputted from the speaker. For example, the microphone would not clearly pick up a user""s response when the user interrupts a voice prompt such as, xe2x80x9cSpeak your login ID.xe2x80x9d The user would have to always remember to wait for each verbal prompt to complete before responding.
In one aspect of the present invention, a voice recognition system is provided for use with a communication system having an incoming line and an outgoing line, the incoming line carrying an incoming signal from a first end to a second end operably attached to an audio output responsive to the incoming signal and the outgoing line carrying an outgoing signal from a second end to a first end, the outgoing line second end being attached to a microphone near the audio output. The voice recognition system includes a first speech recognition unit for detecting an incoming word in the incoming signal, a second speech recognition unit for detecting an outgoing word in the outgoing signal, and a comparator/signal generator operably connected to the first and the second speech recognition units. The comparator/signal generator compares the outgoing word with the incoming word and outputs the outgoing word when the outgoing word does not match the incoming word.
In other aspects of the invention, the first speech recognition unit may be delayed relative to the second speech recognition unit so as to search for a word in the incoming signal corresponding to the outgoing word detected by the second speech recognition unit during the delay. Further, the speech recognition units may search only for selected words, or may ignore words which are first detected by the other speech recognition unit. The speech recognition units may use templates to search only for selected words, and those templates may be trained by the voice prompt system and/or by the user, either as speaker independent or speaker dependent.
In still another aspect of the invention, a signaler may provide a signal indicating inclusion of one of the command words in the known incoming signal with a speech recognition unit responsive to that signal to ignore the included one command word in the template for a selected period of time, where a signal generator operably connected to speech recognition unit generates commands responsive to detection of one of the selected command words by the speech recognition unit.