1. Field of the Invention
The present invention relates to a hands-free telephone which has an echo canceller and an echo suppresser for canceling or suppressing echo signals regarding telephones with loudspeakers, and particularly to voice detection and voice transmission control.
2. Description of the Related Art
FIG. 14 and FIG. 15 illustrate the overall configuration of a known hands-free telephone device in which a loudspeaker, a microphone for hands-free use, and an echo processing unit are connected to a cellular telephone to be used as a loudspeaker telephone. FIG. 14 illustrates an arrangement wherein the cellular telephone is an analog telephone, and FIG. 15, a digital telephone.
In the following description, "echo processing unit" indicates an echo canceller or echo suppresser. "Near-end caller" indicates the user of the loudspeaker telephone, and "far-end caller" indicates the party other than the near-end caller.
First, description will be made regarding the reception system. With reference to the analog cellular telephone illustrated in FIG. 14, signals received by an antenna 29 are demodulated by a wireless unit 28 and become analog voice signals, and further are A/D converted into far-end caller digital signals by an A/D converter 27, thus becoming reception signals R (k). Here, k denotes the time for R(k), S(k), and S2(k), which are digital signals.
Also, with reference to the digital cellular telephone illustrated in FIG. 15, signals received by the antenna 29 are demodulated by the wireless unit 28 and become voice encoded data, and further are decoded by a voice signal decoding unit 32, thus becoming digital reception signals R(k) which are voice signals. The reception signals R(k) carry the voice of the far-end caller.
In the case of either, i.e., of the analog cellular telephone illustrated in FIG. 14 and the digital cellular telephone illustrated in FIG. 15, the voice signals pass through the echo processing unit 25, and the analog voice signals which have been D/A converted by a D/A converter 24 are cast into the air by means of a loudspeaker 22, thus reaching the ear of the near-end caller.
Next, description will be made regarding the transmitting system. In the case of either, i.e., of the analog cellular telephone illustrated in FIG. 14 and the digital cellular telephone illustrated in FIG. 15, the signals input from the hands-free microphone 21 are A/D converted into digital signals by an A/D converter 23, thus becoming input transmission signals S(k).
In the event that only the near-end caller is speaking, the input transmission signal S(k) is the voice of the near-end caller, and in the event that only the far-end caller is speaking, the input transmission signal S(k) is the reception signal R(k) which has been D/A converted, passed through the loudspeaker 22, and re-circuited through the hands-free microphone 21 as echo signals. Also, in the event that both callers are speaking at the same time, this is a superimposed signal of the voice of the near-end caller and the echo signal.
Regarding loudspeaker telephones wherein the use does not use a hand set but rather uses a loudspeaker 22 and a hands-free microphone 21, the echo processing unit 25 functions to prevent echo which consists of the reception signal R(k) being output from the loudspeaker 22 and passing around into the hands-free microphone 21. i.e., the echo processing unit 25 acts to cause the echo component within the input transmission signal S(k) so as to output only the voice of the near-end caller as the output transmission signal S2(k).
The cellular phone proper 30 is provided with voice detection means 33 for inhibiting transmission of airwaves during periods in which the near-end caller is not speaking, as means for saving battery power, and in the case of the analog cellular telephone illustrated in FIG. 14, the voice detection means 33 output a voice detection flag FLG to the wireless unit 28. Accordingly, the wireless unit 28 performs voice transmission control wherein airwaves are transmitted only at times that the voice detection flag FLG indicates presence of sound.
Also, in the case of the digital cellular telephone illustrated in FIG. 15, the voice detection flag FLG is also output to the voice encoding unit 31 and noise canceller 8 located between the wireless unit 28 and the echo processing unit 25, and noise cancellation processing and voice encoding processing is performed only at times that the voice detection flag FLG indicates presence of sound, so that electric power consumption is saved not only at the wireless unit 28 but also at the voice encoding unit 31 and noise canceller 8.
Further, regarding examples of known echo cancellers, an example is described in "Hands-free conversation with echo canceller" (Oki Denki Kenkyu Kaihatsu, January 1989, #141, Vol. 56, No. 1, pp 34-40).
The problems facing echo cancellers will now be described. The known echo processing unit 25 possesses functions for eliminating or inhibiting echo components, but does not have the function for outputting the voice detection flag FLG for voice transmission control by the wireless unit 28. Accordingly, in order to perform voice transmission control, it becomes necessary to perform voice detection of the near-end caller from the output transmission signals output of the echo processing unit 25 at the voice detection means 33 of the cellular telephone proper 30. However, in the event that the echo processing unit 25 is an echo canceller using an adaptive filter, the adaptivity of the filter is insufficient in cases such as described below.
Such situations would be cases where conversation using the loudspeaker telephone has just been initiated and the adaptivity of the adaptive filter is insufficient, or where there is motion such as the near-end caller moving his/her body, causing change in the reflection state of the waveforms from the loudspeaker to the hands-free microphone, consequently causing change in the echo path which the adaptive filter has set as the adaptive object thereof.
In such cases, an echo component remains in the output transmission signal, so in the event that voice detection of the near-end caller is performed based on the output transmission signal, the residual echo within the output transmission signal is mis-identified as the voice of the near-end caller.
This problem is described with reference to FIG. 16A and FIG. 16B. In these Figures, the state of the parameters within the voice detection means 33 wherein the near-end caller is not speaking and only the far-end caller is speaking are illustrated. Here, the method of detection of the near-end caller voice in the voice detection means 33 is as described below.
The level of a certain past section wherein the measured output transmission signal level is low and it can be inferred that the near-end caller is not speaking is averaged, and this value is used as the near-end background noise level. Adding a margin to this background noise level yields the voice detection threshold value TH(k) for the near-end caller voice.
In this case, the near-end caller is not speaking, so the greater part of the input transmission signal S(k) is echo component. In FIG. 16A, the echo component of the input transmission signal S(k) is canceled to a certain degree by the echo canceller, and the level Ls2(k) of the output transmission signal S2(k), which is the residual signal following echo cancellation, is smaller than the input transmission signal level Ls(k) over all sections within FIG. 16A.
The output transmission signal level Ls2(k) is smaller than the voice detection threshold value TH(k) until the residual echo within the output transmission signal S2(k) immediately following initiating of speaking of the far-end caller increases to a certain extent, but when the residual echo within the output transmission signal increases to the extent that the output transmission signal level Ls2(k) exceeds the noise voice detection threshold value TH(k), the voice detection means 33 mis-identifies the residual echo as the voice of the near-end caller, and changes the voice detection flag FLG from 0 to 1 as shown in FIG. 16B. Consequently, not only does the transmission of sound by the wireless unit 28 consume unnecessary electric power, but the far-end caller hears an echo of his/her own voice.
Next, description will be made regarding the problems facing the echo suppresser. This description will be made in regard to the problems which occur in the case that the echo processing unit 25 is an echo suppresser which uses an attenuator.
This echo suppresser is arranged such that in the event that it is indicated by the reception signal level that the far-end caller is speaking, a signal wherein the input transmission signal of the near-end caller is suppressed is output as an output transmission signal, and on the other hand, in the event that it is indicated by the reception signal level that the far-end caller is no longer speaking, a signal wherein the input transmission signal of the near-end caller is not suppressed is output as the output transmission signal.
This problem regarding the echo suppresser is described with reference to FIG. 17A and FIG. 17B. In these Figures, the state of the parameters within the voice detection means 33 wherein the near-end caller is not speaking and only the far-end caller is speaking, and also wherein the background noise is relatively great, are illustrated.
First, the far-end caller initiates speaking, and when the input transmission signal level Ls(k) exceeds the voice detection threshold TH(k) as shown in FIG. 17A, a signal wherein the input transmission signal S(k) has been suppressed is output from the echo suppresser as the output transmission signal S2(k), and the output transmission signal level Ls2(k) rapidly decreases to a level lower than the near-end background noise level.
Also, the voice detection threshold TH(k) follows the output transmission signal level Ls2(k) in a somewhat delayed manner, and is reduced in accordance of the reduction in the output transmission signal level Ls2(k) with the reduction in the level of the background noise. When the far-end caller finishes speaking, the echo suppresser disengages the suppression on the output transmission signal S2(k), so that the output transmission signal level Ls2(k) suddenly increases and exceeds the voice detection threshold TH(k). The voice detection means 33 mis-identifies this as the voice of the near-end caller, and changes the voice detection flag FLG from 0 to 1, as shown in FIG. 17B. Consequently, not only does the transmission of sound by the wireless unit 28 consume unnecessary electric power, but the far-end caller hears unnatural background noise in response to ending of his/her own speaking.
Depending on the configuration of the echo canceller, there are arrangements wherein an attenuator similar to the echo suppresser is provided along with the adaptive filter, but the same problems as with the echo suppresser described above occur with the echo canceller, as well.
Also, in cases where the echo processing unit 25 is an echo suppresser using an attenuator, and in cases where an echo canceller is provided with an attenuator, the following problems occur in the noise canceller.
Generally, noise cancellers perform estimation of background noise properties at times when the input signal to the noise canceller is comprised solely of background noise, and control the amount of cancellation of noise based on the earlier-estimated background noise properties, according to whether the input signals are voice signals or not. What is important here is that in the case that background noise properties estimation is performed in a voice section, the noise cancellation results in deterioration of voice quality.
However, in cases where the echo processing unit 25 is an echo suppresser using an attenuator, and in cases where an echo canceller is provided with an attenuator, the voice detection means 33 erroneously raises a voice detection flag FLG, so that the noise canceller 34 is not able to perform appropriate noise cancellation processing but rather deteriorates the voice quality, giving the far-end caller an unnatural impression.