1. Field of the Invention
The present invention relates to a speakerphone that estimates an echo canceling performance of an echo canceler through an easy processing and improves the transmit and receive switching performance of the voice switch, in a full-duplex communication system using a speaker and microphone.
2. Description of the Related Art
The speakerphone used for a telephone conversation using a speaker and microphone without using a handset has been applied widely to a teleconference system that connects plural locations, and to the automobile telephone system wherein the driver cannot free his hands from the steering for obvious safety reasons.
However, this speakerphone involves troublesome phenomena, such as an acoustic echo generated by sounds emitted from the speaker returning to the microphone while reflecting, and a line echo generated by a talker""s uttered voice being reflected at the connections on the communication line due to the impedance mismatching thereat. FIG. 6 is a chart for simply explaining the acoustic echo and the line echo.
What makes the problem acute is that the acoustic echo path and the line echo path coincide so as to make up a closed loop (formed of a microphone 61, communication line SP, and speaker 62), as shown in FIG. 6. If the gain of the foregoing dosed loop exceeds 1, it will generate an oscillation (howling) inside the closed loop, which will in the worst case disable the conversation. Even if the howling does not occur, if there is a line echo, the talker""s uttered voice will be emitted from the speaker 62 with a delay, and hence, the talker will be in a trouble of speaking.
Devices have been provided in order to avoid the influence of these echoes, which can be classified roughly into two. One of them is a half duplex voice switching system, wherein, when a near-end talker is speaking, an electric loss is inserted on the receive path of the talker (transmit state), when the talker is listening to, an electric loss is inserted on the transmit path of the talker (receive state). In this system, the switching of the transmit and the receive state is carried out on the basis of the voices uttered by the near-end talker or the far-end talker.
The other one is an echo canceling system, wherein an adaptive filter to estimate the characteristic of the foregoing echo is employed to produce a signal similar to the echo, and the signal is subtracted from the transmit and the receive paths to thereby remove the echo signal from the dosed loop. In the echo canceling system, echoes are removed in real time, both the transmit and receive paths are not closed, and hence, the full duplex communication is possible.
The technique relating to the speakerphone using the foregoing voice switch is disclosed, for example, in the Japanese Patent Application Laid-open No. 5-44221. FIG. 7 is a functional block representation of the speakerphone disclosed in the foregoing document.
As shown in FIG. 7, a speakerphone 100 using the voice switch comprises a transmit section 200, a receive section 300, and a computer 110. The transmit section 200 includes a multiplexer 210 for temporarily storing a plurality of input signals such as speech signals inputted from a microphone 111, a mute control 211 to dose the transmit path in accordance with a control signal from the computer 110 described later, a high pass filter 212 for removing background noises contained in the foregoing speech signals, a programmable attenuator 213 (equivalent to receive state setting means) for giving attenuation to the foregoing speech signals passed through the high pass filter 212 in accordance with a control signal from the computer 110, an envelope detector 214 for detecting an envelope of a speech signal outputted from the high pass filter 212, a low pass filter 215 for reducing switching noises generated by the programmable attenuator 213 and shaping output waveforms to a communication line 101, and a logarithmic amplifier 216 for logarithmically amplifying an output from the envelope detector 214.
The receive section 300 contains functionally the same circuits as the transmit section 200: a multiplexer 310 for temporarily storing a plurality of input signals such as speech signals received through a communication line 102, a mute control 311 to dose the receive path in accordance with a control signal from the computer 110, a high pass filter 312 for removing background noises contained in the foregoing speech signals, a programmable attenuator 313 (equivalent to transmit state setting means) for giving attenuation to the foregoing speech signals passed through the high pass filter 312 in accordance with a control signal from the computer 110, an envelope detector 314 for detecting an envelope of a speech signal outputted from the high pass filter 312, a low pass filter 315 for reducing switching noises generated by the programmable attenuator 313 and shaping output waveforms to a speaker 112, and a logarithmic amplifier 216 for logarithmically amplifying an output from the envelope detector 314.
And, the foregoing computer 110 (equivalent to state switching means) receives signals from the logarithmic amplifiers 216, 316 through a multiplexer 117 and an A/D converter 115, and controls the mute controls 211, 311 and the programmable attenuators 213,313. Further, the computer 110 is connected to a calibration circuit 113 as well. The calibration circuit 113 feeds a specific calibration tone to the multiplexers 210 and 310 to assist the estimation of system characteristics.
The operation of the foregoing speakerphone, specially a transmit break-in operation switching from the receive state to the transmit state will hereunder be described. FIG. 8 is a flow chart for explaining the transmit break-in operation.
As shown in FIG. 8, when the process comes into step 1001, the speakerphone enters the receive state. Then, the process advances to step 1002 where a determination is made as to whether a transmit signal TX-S inputted from the microphone 111 exceeds an expected transmit signal IX-E by a specific threshold Th. Here, the expected transmit signal TX-E is a transmit signal expected to be generated by the coupling of the receive signal RX-S from the speaker 112 to the microphone 111. The reason to provide this step 1002 is to prevent a phenomenon that the device generates the self-switching by the receive signal RX-S emitted from the speaker 112 and the influence of an acoustic echo, while the near-end talker does not speak.
At step 1002, if the transmit signal TX-S exceeds the expected transmit signal TX-E, the process advances to step 1003 where a determination is made as to whether the transmit signal TX-S exceeds a transmit noise TX-N by a specific threshold Th. The decision at this step is provided to determine whether the transmit signal TX-S is a voice signal or a noise signal.
At step 1003, after the transmit signal TX-S is confirmed as a voice signal, the process advances to step 1004 where a comparison is made whether the transmit signal TX-S exceeds the receive signal RX-S by a specific threshold Th. And, if the transmit signal TX-S is greater than the receive signal RX-S at step 1004, the process moves to step 1005 where the holdover timer is initialized, and then the process moves to step 1006 where it brings the device into the transmit state.
Thus, the foregoing speakerphone prevents an error switching due to the acoustic echo by comparing the transmit signal TX-S with the expected transmit signal TX-E. To prevent the error switching due to the line echo is performed substantially in the same manner as in the acoustic echo, and the description will be omitted
Incidentally, the threshold used in the foregoing expected transmit signal TX-E and the decision at step 1004 is determined by using a calibration tone actually outputted from the calibration circuit 113. More concretely, the calibration circuit 113 generates a audio frequency signal covering from 300 Hz to 3.4 kHz, and the speaker emits the audible sounds into the environment in a regular manner. On the basis of the acoustic response characteristics then measured, the maximum amplitude of the acoustic echo and the duration of reverberation, etc., are obtained. Thereby, the foregoing expected transmit signal TX-E and the threshold are determined. The calibration tone is transmitted while the speech signal is not detected on the transmit path and the receive path so as to vary the expected transmit signal TX-E and/or the threshold in correspondence with the change of the environment.
Accordingly, when the environment produces less reverberation and the acoustic condition is good, or when the line condition is good, it is possible to perform a communication that approaches to the fill duplex system by lowering the break-in threshold determined in accordance with the acoustic echo or the line echo.
On the other hand, the technique relating to the echo canceler is disclosed, for example, in the Japanese Patent Application Laid-open No. 61-258554. FIG. 9 illustrates a block diagram of the echo canceler disclosed in the foregoing document.
As shown in FIG. 9, the echo canceler includes: an XR memory 906 for storing in time series a receive signal XR received from the communication line, an A memory 907 for storing an estimated value A of the acoustic echo returning to the microphone 901 from a speaker 902 while reflecting, an arithmetic circuit 908 for operating the convolution of the receive signal XR and the estimated value A, a subtracter 909 for subtracting the output of the arithmetic circuit 908 from the acoustic echo signal to thereby suppress the acoustic echo signal, an XT memory 910 for storing in time series a transmit signal XT, an H memory 911 for storing an estimated value H of the line echo, an arithmetic circuit 912 for operating the convolution of the transmit signal XT and the estimated value H, a subtracter 913 for subtracting the output of the arithmetic circuit 912 from the line echo signal to thereby suppress the line echo signal, an adaptive control circuit 914 for acquiring an adjusting coefficient that sequentially adjusts the estimated value A stored in the A memory 907 on the basis of the receive signal XR stored in the XR memory 906 and the output of the subtracter 909, and the same for acquiring an adjusting coefficient that sequentially adjusts the estimated value H stored in the H memory 911 on the basis of the transmit signal XT stored in the XT memory 910 and the output of the subtracter 913, an adder 915 for sequentially adjusting the estimated value A by adding the adjusting coefficient acquired by the adaptive control circuit 914, an adder 916 for sequentially adjusting the estimated value H by adding the adjusting coefficient acquired by the adaptive control circuit 914, switches 917, 918, 919 for selecting the input/output signals of the adaptive control circuit 914, and a signal detector 920 (equivalent to speech signal detection means) for detecting the speech signal of the transmit signal and the receive signal and controlling the switches 917, 918, 919.
Although an echo canceler is usually provided with the adaptive control circuit for removing the acoustic echo and the adaptive control circuit for removing the line echo separately, the foregoing echo canceler, having a single adaptive control circuit 914, performs the processings usually done by the foregoing two adaptive control circuits with the assistance of the signal detector 920 and the switches 917, 918, 919 to control the input/output; and thereby achieves to simplify the hardware construction. Here, the process of removing the acoustic echo is basically the same as that of removing the line echo, and hence, the removal of the acoustic echo will mainly be referred to hereunder, and the removal of the line echo will be omitted as long as not needed.
In the foregoing echo canceler, when the signal detector 920 detects a speech signal only in the receive signal XR, the echo canceler starts the adaptive learning. In other words, the adjusting coefficient xcex94 an acquired by the adaptive control circuit 914 sequentially modifies the estimated value sequence an of the impulse response, stored in the A memory 907. This adjustment employs, for example, the method of identification by learning. The following equation (1) expresses concretely the adjustment by the method of identification by learning.                                                                         a                n                            =                                                a                                      n                    -                    1                                                  +                                  Δ                  ⁢                                      xe2x80x83                                    ⁢                                      a                                          n                      -                      1                                                                                                                                              =                                                a                                      n                    -                    1                                                  +                                                                            α                      ⁡                                              (                                                                              Y                            ⁢                                                          xe2x80x83                                                        ⁢                                                          R                                                              n                                -                                1                                                                                                              -                                                                                                                    a                                                                  n                                  -                                  1                                                                                            ·                              X                                                        ⁢                                                          xe2x80x83                                                        ⁢                                                          R                                                              n                                -                                1                                                            xe2x80x2                                                                                                      )                                                              ⁢                    X                    ⁢                                          xe2x80x83                                        ⁢                                          R                                              n                        -                        1                                                                                                  X                    ⁢                                          xe2x80x83                                        ⁢                                                                  R                                                  n                          -                          1                                                                    ·                      X                                        ⁢                                          xe2x80x83                                        ⁢                                          R                                              n                        -                        1                                            xe2x80x2                                                                                                                                              =                                                a                                      n                    -                    1                                                  +                                                                            α                      ⁡                                              (                                                                              Y                            ⁢                                                          xe2x80x83                                                        ⁢                                                          R                                                              n                                -                                1                                                                                                              -                                                                                    ∑                                                              k                                =                                0                                                                                            N                                -                                1                                                                                      ⁢                                                                                                                            a                                                                                                            n                                      -                                      1                                                                        ,                                    k                                                                                                  ·                                X                                                            ⁢                                                              xe2x80x83                                                            ⁢                                                              R                                                                  n                                  -                                  k                                  -                                  1                                                                                                                                                                    )                                                              ⁢                    X                    ⁢                                          xe2x80x83                                        ⁢                                          R                                              n                        -                        1                                                                                                                        ∑                                              j                        =                        1                                            N                                        ⁢                                          X                      ⁢                                              xe2x80x83                                            ⁢                                              R                                                  n                          -                          1                                                2                                                                                                                                                    [                  equation          ⁢                      xe2x80x83                    ⁢          1                ]            
here, a: loop gain, N: degree of the adaptive filter, YRnxe2x88x921: acoustic echo signal at time nxe2x88x921.
Further, the foregoing adaptive learning is performed when the speech signal is detected only in the receive signal, and the reason is as follows. Since the speech signal uttered by the near-end talker is originally independent on the acoustic echo characteristics, if the speech signal uttered by the near-end talker together with the acoustic echo signal is inputted to the echo canceler, the speech signal uttered by the near-end talker functions as disturbances so as to obstruct the learning of the echo canceler.
Thus, the speakerphone containing the foregoing echo canceler is able to remove the echo in a better accuracy along with the time progress, by sequentially adjusting the impulse response of the adaptive filter, using the method of identification by learning.
Incidentally, the combination of the foregoing voice switching system and the echo canceling system will attenuate the echo by the attenuator of the voice switch as well as remove the echo to some extent by the adaptive filter of the echo canceler. Therefore, the combination has a possibility to provide a system that approaches to the full duplex system and is more immune from influence by the echoes.
The problem here lies in the setting of the threshold for switching the transmit state and the receive state in the voice switching system. The voice switching system is able to directly measure the characteristics of the system by the calibration tone generated from the calibration circuit 113 and to adjust the threshold to the echo. However, if combined with the echo canceling system, the amount of the echo varies from moment to moment which remains in the system in correspondence with the degree of learning by the adaptive filter. Therefore, the threshold calibrated only within a specific period will cause most calibrations to result in wastes, and will cause error switching as well.
The present invention intends to solve the problems in these conventional techniques and to improve the speakerphone, and it is therefore an object of the invention to provide a speakerphone combining the voice switching system and the echo canceling system, which stably performs a communication that approaches to the full duplex while estimating a performance variation of the adaptive filter on the basis of the past signal referred to when the adaptive filter performs the learning.
In order to achieve the foregoing object, the speakerphone according to the first invention comprises: receive state setting means for setting a receive state to attenuate a transmit signal inputted from a microphone before transmitting the transmit signal into a communication line; transmit state setting means for setting a transmit state to attenuate a receive signal received from the communication line before outputting the receive signal from a speaker; state switching means for determining a state and setting the determined state, said state switching means comprising means for comparing the difference between the transmit signal and the receive signal with an acoustic echo threshold set to an acoustic echo generated by the receive signal returning to the microphone from the speaker; speech signal detection means for detecting a speech signal from the transmit signal and the receive signal; acoustic echo canceling means, including an adaptive filter for sequentially estimating the characteristics of the acoustic echo by varying the response on the basis of the acoustic echo when the speech signal detection means detects the speech signal only in the receive signal, for subtracting a quasi-acoustic echo signal obtained by inputting the receive signal to the adaptive filter from the transmit signal; residual acoustic echo estimation means for estimating a residual acoustic echo signal remaining without being removed by the acoustic echo canceling means on the basis of the history of the receive signal outputted in the past from the speaker; and acoustic echo threshold variation means for varying the acoustic echo threshold of the state switching means in accordance with the residual acoustic echo signal estimated by the residual acoustic echo estimation means.
The speakerphone relating to the foregoing first invention is able to estimate the residual acoustic echo signal on the basic of the receive signal emitted in the past from the speaker, and to vary the acoustic echo threshold for switching the transmit state and receive state in correspondence with the estimated residual acoustic echo signal. Therefore, putting the acoustic echo canceling means and the transmit/receive state setting means into cooperation, the speakerphone is able to achieve a full-duplex communication system, which improves in the transmit/receive switching performance and presents a better feeling of operation compared to the conventional speakerphone.
In the foregoing speakerphone, the residual acoustic echo signal can be estimated, for example, on the basis of the integrated value of power of the receive signal obtained by integrating the power of the receive signal when the foregoing speech signal detection means detects the speech signal only in the receive signal. And, the residual acoustic echo signal can be estimated on the basis of the integrated value of the receive signal detected obtained by integrating the detected time of the receive signal when the foregoing speech signal detection means detects the speech signal only in the receive signal. A simple process such as the integration of power of the receive signal or the integration of time of the receive signal detected will reduce the quantity of the arithmetic operation for varying the acoustic echo threshold, which provides a speakerphone system that is inexpensive and consumes a less power compared to the conventional.
Further, in the foregoing speakerphone, the state switching means maintains the receive state or the transmit state when the speech signal detection means detects the speech signal from the receive signal or the transmit signal, and shifts the attenuation in the receive state setting means and the transmit state setting means into an intermediate attenuation when the speech signal detection means does not detect a speech signal.
Accordingly, in such a circumstance that there is not a great level difference between the receive signal and the transmit signal, for example, while receiving a speech signal from a far-end talker, the talker lapses into silence for a while, the speakerphone with this arrangement is able to return to the receive state immediately when the far-end talker resumes speaking, and to avoid an initial sound from being cut out as is the case with the conventional speakerphone, since this arrangement will not change the processing procedure while maintaining the receive state and only shifting the attenuation in the receive state and transmit state setting means into an intermediate attenuation.
Further, the speakerphone according to the second invention comprises: receive state setting means for setting a receive state to attenuate a transmit signal inputted from a microphone before transmitting the transmit signal into a communication line; transmit state setting means for setting a transmit state to attenuate a receive signal received from the communication line before outputting the receive signal from a speaker; state switching means for determining a state and setting the determined state, said state switching means comprising means for comparing the difference between the transmit signal and the receive signal with a line echo threshold set to a line echo generated by the transmit signal returning to a line receive side from a line transmit side; speech signal detection means for detecting a speech signal from the transmit signal and the receive signal; line echo canceling means, including an adaptive filter for sequentially estimating the characteristics of the line echo by varying the response on the basis of the line echo when the speech signal detection means detects the speech signal only in the transmit signal, for subtracting a quasi-line echo signal obtained by inputting the transmit signal to the adaptive filter from the receive signal; residual line echo estimation means for estimating a residual line echo signal remaining without being removed by the line echo canceling means on the basis of the history of the transmit signal outputted in the past from the microphone to the line; and line echo threshold variation means for varying the line echo threshold of the state switching means in accordance with the residual line echo signal estimated by the residual line echo estimation means.
The speakerphone relating to the foregoing second invention is able to estimate the residual line echo signal on the basis of the transmit it signal outputted in the past from the microphone to the line, and to vary the line echo threshold for switching the transmit state and receive state in correspondence with the estimated residual line echo signal. Therefore, putting the line echo canceling means and the transmit/receive state setting means into cooperation, the speakerphone is able to achieve a full-duplex communication system, which improves in the transmit/receive switching performance and presents a better feeling of operation compared to the conventional speakerphone.
In the foregoing speakerphone, the residual line echo signal can be estimated, for example, on the basis of the integrated value of power of the transmit signal obtained by integrating the power of the transmit signal when the foregoing speech signal detection means detects the speech signal only in the transmit signal. And, the residual line echo signal can be estimated on the basis of the integrated value of the transmit signal detected obtained by integrating the detected time of the transmit signal when the foregoing speech signal detection means detects the speech signal only in the transmit signal. A simple process such as the integration of power of the transmit signal or the integration of a detected time of the transmit signal will reduce the quantity of the arithmetic operation for varying the line echo threshold, which provides a speakerphone system that is inexpensive and consumes a less power compared to the conventional.
Further, in the foregoing speakerphone, the state switching means maintains the transmit state or the receive state when the speech signal detection means detects the speech signal from the transmit signal or the receive signal, and shifts the attenuation in the receive state setting means and the transmit state setting means into an intermediate attenuation when the speech signal detection means does not detect a speech signal.
Accordingly, in such a circumstance that there is not a great level difference between the transmit signal and the receive signal, for example, a near-end talker lapses into silence for a while during communication, the speakerphone with this arrangement is able to return to the transmit state immediately when the near-end talker resumes speaking to detect the speech signal, and to avoid an initial sound from being cut out as is the case with the conventional speakerphone, since this arrangement will not change the processing procedure while maintaining the transmit state and only g the attenuation in the transmit state and receive state setting means into an intermediate attenuation.
Further, the speakerphone according to the third invention is a combination of the first and the second invention. The third invention is able to reduce the effect of both an acoustic echo and a line echo and realize the smoother switching of the state.
Further scope of applicability of the present invention will become apparent from the detailed description given hereinafter. However, it should be understood that the detailed description and specific examples, while indicating preferred embodiments of the invention, are given by way of illustration only, since various changes and modifications within the spirit and scope of the invention will become apparent to those skilled in the art from this detailed description.