A full-duplex transceiver for a phone call such as a telephone includes a microphone and a loudspeaker. In this case, echoes may be generated if the microphone picks up sound output from the loudspeaker and the sound is output from the loudspeaker via a network. Such kind of echo is called “sound echo”. A process to avoid sound echo is called “sound echo canceling”. And a processing section for managing the process of sound echo canceling is called a “sound echo canceler”.
If using a hands-free function with a usual fixed-line phone or a cellular phone, acoustic feedback from the loudspeaker to the microphone is greater. Therefore, to realize a clear phone call, it is very important to perform sound echo canceling. It is the same as in a telephone conference system and the like. Acoustic feedback from the loudspeaker to the microphone also occurs with ordinary usage of a usual fixed-line phone or a cellular phone. Therefore, it is important to perform sound echo canceling.
Methods of sound echo canceling include a method that processes a voice signal in time domain, a method that processes a voice signal by transforming the voice signal into a signal in frequency domain, and the like. It is common with telephones nowadays that a voice signal detected by a microphone is transformed into a digital signal by an AD converter to have digital signal processing applied. In this case, a sound echo canceler generally uses a signal obtained by transforming a digital signal in time domain into frequency domain.
Also, there are telephones that have a rate-of-speech change function installed, which changes the reproducing speed of voice of a phone-call partner slower or faster while keeping the pitch of the voice. When used for a phone call, the rate-of-speech change function is mainly used for slowing down the speed of voice to make it easier to hear the voice of a phone-call partner.
As seen from the above, a telephone requires multiple processes in time domain as well as in frequency domain. In many cases, digital signal processing is applied by units of frames where a frame includes multiple sampling values of a digital signal. Widely used frame-based digital signal processing includes time-frequency transform where a frame of a signal in time domain is transformed into a frame of the signal in frequency domain, and frequency-time transform where a frame of a signal in frequency domain is transformed into a frame of the signal in time domain.
FIG. 1 illustrates an example of a functional block diagram of a telephone. A reception voice signal 100 is a signal in frequency domain of a reception voice signal that may have processes applied including an AGC process, a noise reduction process, a voice emphasis process, and the like (not illustrated).
A frequency-time transform section 110 transforms the reception voice signal in frequency domain 100 into a signal in time domain 111, and feeds it into a rate-of-speech change section 112. The rate-of-speech change section 112 applies a rate-of-speech change process to the signal in time domain 111, outputs a signal after rate-of-speech change in time domain 113 to a loudspeaker 114 and to a time-frequency transform section 108.
The time-frequency transform section 108 transforms the signal after rate-of-speech change in time domain 113 into a signal in frequency domain 109A, and feeds it into a sound echo canceler 106.
An analog sound signal 120 output from the loudspeaker 114 reaches a microphone 102 through the air and the housing of the telephone. The microphone 102 transforms a part of the analog sound signal 120 from the loudspeaker 114 into a digital transmission voice signal in time domain 103. Here, AD converters, DA converters, amplifiers, and the like are not illustrated for the sake of simplicity.
A time-frequency transform section 104 transforms the transmission voice signal in time domain 103 into a transmission voice signal in frequency domain 105, and feeds it into the sound echo canceler 106.
Here, there exists a transfer characteristic specific to a signal transfer path starting from the rate-of-speech change section 112 to the sound echo canceler 106 via the loudspeaker 114, the microphone 102, and the time-frequency transform section 104. The transmission voice signal in frequency domain 105 has the signal after rate-of-speech change in time domain 113 as its source that is mixed with a signal that is affected by the transfer characteristic. The mixed signal is a cause of sound echo.
The sound echo canceler 106 processes the transmission voice signal in frequency domain 105, for example, to cancel the mixed signal by using the signal in frequency domain 109A and an adaptive filter (not illustrated) in frequency domain based on the transfer characteristic. This process suppresses generation of sound echo. The sound echo canceler 106 outputs a transmission voice signal in frequency domain 130 having sound echo suppressed.
Here, there exists a technology that provides a rate-of-speech change section for changing a time axis of a voice signal of a phone call partner transmitted via a telephone communication channel and an echo canceler section for deleting a sidetone signal (echo), in which a sidetone is removed by the echo canceler section provided at the previous stage of the rate-of-speech change section so that the sidetone does not reach the rate-of-speech change section to prevent the sidetone having a rate-of-speech change applied from hindering the phone call partner's talk (see, for example, Patent Document 1).
Also, there exists a technology in that a rate-of-speech change device for applying an adaptive rate-of-speech change to an input signal includes a physical index calculation section for calculating a physical index for each segment of the input signal obtained by dividing the input signal with unit times, and a rate-of-speech change factor determination section for determining a magnification factor of rate-of-speech change specified for each of the segments of the input signal to perform rate-of-speech change depending on the physical index calculated by the physical index calculation section. With this technology, rate-of-speech change can be stably applied to an input signal in which background sound and voice are mixed (see, for example, Patent Document 2).