(i) Field of the Invention
The present invention relates to a voice switching system for and a voice switching method of removing sound echoes and reducing occurrence of howling in sound reinforced communication systems such as hands-free telephones and teleconference systems which are on their ways to becoming popular in recent years.
(ii) Description of the Related Art
There have so far been developed a wide variety of sound reinforced communication systems each of which makes it possible for many people to talk with one another each with one set of hands-free speaker and microphone.
The sound reinforced communication system of this type is in general equipped with at least one pair of hands-free telephone units, i.e., first and second hands-free telephone units positioned in respective closed near-end and far-end rooms distant far away from each other and each having one set of hand-free speaker and microphone, thereby making it possible for two operators to communicate with each other by the speakers and microphones of the first and second hands-free telephone units through a special transmission line having the first and second hands-free telephone units electrically connected with each other. The speaker and microphone forming parts of the hands-free telephone unit begins to be operated when the operator inputs his or her voice to the microphone, so that the reinforced communication unit has another name called xe2x80x9ca voice switching systemxe2x80x9d.
One of typical examples among those conventional voice switching systems is disclosed, for example, in Japanese Patent Application Laid-Open No. 62151/1990.
The conventional voice switching system disclosed in this Japanese publication is shown in FIG. 7 as comprising a reception voice attenuator 702 designed to attenuate the reception signal inputted from a signal input terminal 701 to output the attenuated signal to a speaker 703, a transmission voice attenuator 705 adapted to attenuate the transmission signal inputted from a microphone 704 to output the attenuated signal to an output terminal 706, a receiving-side voice detection processing section 707 operative to carry out voice detection processing on the inputted reception signal, a transmitting-side voice detection processing section 713 functioning to carry out voice detection processing on the inputted transmission signal, and a loss controlling section 719 serving to control the losses of the reception voice attenuator 702 and the transmission voice attenuator 705.
The receiving-side voice detection processing section 707 thus constructed comprises a signal level computing section 708 which computes the amplitude level of the reception signal inputted from the signal input terminal 701, a time constant selecting section 709 which selects a time constant used when a minimum reception signal level is computed, a minimum signal level computing section 710 which computes the minimum reception signal level, a threshold computing section 711 which computes a threshold from the minimum reception signal level computed in the minimum signal level computing section 710, and a voice detecting section 712 which carries out voice detection according to the reception signal level and the threshold. The amplitude level computed in the signal level computing section 708 is a reception signal level obtained by rectifying and smoothing the reception signal.
The transmitting-side voice detection processing section 714 further comprises a signal level computing section 714 which computes the amplitude level of the transmission signal inputted from the microphone 704, a time constant selecting section 715 which selects a time constant used when a minimum transmission signal level is computed, a minimum signal level computing section 716 which computes the minimum transmission signal level, a threshold computing section 717 which computes a threshold from the minimum transmission signal level computed in the minimum signal level computing section 716, and a voice detecting section 718 which carries out voice detection according to the transmission signal level and the threshold. The amplitude level computed in the signal level computing section 714 is a transmission signal level obtained by rectifying and smoothing the transmission signal.
The following description will be directed to the operation of the above conventional voice switching system with reference to FIG. 7.
When a reception signal, i.e., the voice signal of a far-end speaker is inputted to the signal input terminal 701, the reception signal is outputted to a near-end speaker from the speaker 703 by way of the reception voice attenuator 702. The microphone 704 is then operated to collect the voice outputted from the speaker 703 and the voice of the near-end speaker speaking to the microphone 704 to output a transmission signal. This transmission signal becomes a transmission output signal via the transmission voice attenuator 705, and the transmission output signal is outputted to the far-end speaker from the signal output terminal 706.
Description will then be given to the receiving-side voice detection processing section 707 and transmitting-side voice detection processing section 713 required to compute the losses to be inserted into the reception voice attenuator 702 and the transmission voice attenuator 705 by the loss controlling section 719. Only the receiving-side voice detection processing section 707 will appear because the receiving-side voice detection processing section 707 and the transmitting-side voice detection processing section 713 are operated in the same manner.
In the signal level computing section 708, the amplitude level of a reception signal in each sample or frame (multiple samples) is computed to obtain a signal level Lri(k) in which the legend xe2x80x9ckxe2x80x9d represents a sample number or a frame number. In the time constant selecting section 709, a time constant xe2x80x9cTrxe2x80x9d is determined according to the amplitude level of the reception signal. In the minimum signal level computing section 710, a minimum reception signal level Nr(k) is computed by the smoothing processing of the following equation 1 using this time constant.
Nr(k)=Nr(kxe2x88x921)+Tr(Lr(k)xe2x88x92Nr(kxe2x88x921))xe2x80x83xe2x80x83(equation 1)
In the threshold computing section 711, a threshold xe2x80x9cThrxe2x80x9d for voice detection is computed by the following equation 2 based on the minimum reception signal level Nr(k),
Thr=xcex1xc2x7Nr(k)xe2x80x83xe2x80x83(equation 2)
wherein the legend xe2x80x9cxcex1xe2x80x9d is indicative of a coefficient for computing the threshold.
In the voice detecting section 712, the reception signal level Lri(k) is compared with the threshold xe2x80x9cThrxe2x80x9d, and when the reception signal level is higher than the threshold, it is determined that a voice is present, while when the reception signal level is lower than the threshold, it is determined that no voice is present.
The methods of computing and controlling the losses in the loss controlling section 719 will then be described hereinafter.
The loss controlling section 719 is firstly operated to have a transmission signal level Lsi(k) compared with a reception output signal level Lro(k) obtained by multiplying the reception signal level Lri(k) by a receiving-side loss xe2x80x9cGrxe2x80x9d, and a sound echo path gain xe2x80x9cxcex1hxe2x80x9d is computed by the following equation.
xcex1h=Lsi (k)/Lro(k)xe2x80x83xe2x80x83(equation 3)
The loss controlling section 719 is similarly operated to have a reception signal level Lri(k) compared with a transmission output signal level Lso(k) obtained by multiplying the transmission signal level Lsi(k) by a transmitting-side loss Gs, and a circuit echo path gain xe2x80x9cxcex2hxe2x80x9d is computed by the following equation.
xcex2h=Lri(k)/Lso(k)xe2x80x83xe2x80x83(equation 4)
Computed by the following equation with the sound echo path gain xe2x80x9cxcex2hxe2x80x9d and the circuit echo path gain xe2x80x9cxcex2hxe2x80x9d is an insertion loss xe2x80x9cGxe2x80x9d,
G=Hm/(Mcxc2x7xcex1hxc2x7xcex2h)xe2x80x83xe2x80x83(equation 5)
wherein Mc is a correction coefficient and xe2x80x9cHmxe2x80x9d is a howling margin.
In the loss controlling section 719, it is determined based on the results of the determinations in the voice detecting section 712 and the voice detecting section 718 whether the system is in a reception state or a transmission state. When the system is held in the reception state, the loss of the reception voice attenuator 702 is gradually decreased while the loss of the transmission voice attenuator 705 is gradually increased toward a target value which is the insertion loss xe2x80x9cGxe2x80x9d computed on the basis of the equation 5. When, on the other hand, the system is held in the transmission state, the loss of the reception voice attenuator 702 is gradually increased toward a target value, i.e., the insertion loss xe2x80x9cGxe2x80x9d computed on the basis of the equation 5 while the loss of the transmission voice attenuator 705 is gradually decreased.
The switching system thus constructed in the above is operated to compare the voice of the near-end speaker with that of the far-end speaker and to control the loss of the transmission voice and that of the reception voice relatively to each other in such a manner that the one of higher level is outputted without being attenuated and the one of lower level is outputted after attenuated, thereby making it possible to lessen echoes from the speaker as well as to reduce the howling caused by the combined sounds from the near-end speaker and the far-end speaker to a minimum level.
The voice detection process performed in the above conventional voice switching system is, however, forced to have the time constant set at a relatively large value with small fluctuations by tracing the minimum signal level when the minimum signal level is computed by the smoothing processing expressed by the equation 1.
The conventional voice switching system therefore encounters such a problem that although it can estimate the minimum signal level accurately when the ratio (SN) of sound signal to noise signal is high, it cannot track a change in the level of an unstationary noise produced in a vehicle when the vehicle accelerates or decelerates or at a platform when a train leaves or stops at the station, thereby causing a degradation in the performance of the voice switching system.
The conventional voice switching system has another problem that the low S/N ratio tends to cause noises detected as a voice, thereby deteriorating the operational performance of the voice switching system.
It is therefore an object of the present invention to provide a voice switching system and a voice switching method which can solve such problems inherent to the prior art voice switching system.
It is another object of the present invention to provide a voice switching system and a voice switching method which are capable of carrying out highly accurate voice detection and performing a switching action properly even with the levels of noises abruptly generated and fluctuated under circumstances where the S/N ratio is low.
The voice switching system according to the present invention comprises a receiving-side voice detection processing section which carries out the voice detection processing of a reception signal, a transmitting-side voice detection processing section which carries out the voice detection processing of a transmission signal, a reception voice attenuator which attenuates the above reception signal, a transmission voice attenuator which attenuates the above transmission signal, and a loss controlling section which controls the losses of the above reception voice attenuator and the above transmission voice attenuator according to the results of the voice detection processings of the above receiving-side voice detection processing section and the above transmitting-side voice detection processing section. The above receiving-side voice detection processing section and the above transmitting-side voice detection processing section each comprises a signal level computing section which computes the amplitude level of the above reception signal or the above transmission signal in each predetermined sample or frame, a noise level estimating section which computes a noise level estimate from the signal outputted from this signal level computing section, a threshold computing section which computes a threshold for detecting a voice from the above noise level estimate, and a voice detecting section which compares the above reception signal or the above transmission signal with the above threshold to detect a voice.
Therefore, according to the present invention, a noise level estimate having trackability to an unstationary noise whose level changes constantly can be computed, the occurrence of misdetection of a voice can be reduced even under circumstances where the S/N ratio is low, the presence or absence of a voice can be detected more distinctly, and more accurate voice detection than the conventional voice switching system can be carried out.
Further, the voice switching system of the present invention comprises a receiving-side voice detection processing section which carries out the voice detection processing of a reception signal, a transmitting-side voice detection processing section which carries out the voice detection processing of a transmission signal, a reception voice attenuator which attenuates the above reception signal, a transmission voice attenuator which attenuates the above transmission signal, and a loss controlling section which controls the losses of the above reception voice attenuator and the above transmission voice attenuator according to the results of the voice detection processings of the above receiving-side voice detection processing section and the above transmitting-side voice detection processing section. The above receiving-side voice detection processing section and the above transmitting-side voice detection processing section each comprises a signal level computing section which computes the amplitude level of the above reception signal or the above transmission signal in each predetermined sample or frame, a noise level estimating section which computes a noise level estimate from the signal outputted from this signal level computing section, a threshold updating section which computes a threshold for detecting a voice from the above noise level estimate and updates the above threshold according to the above noise level estimate and the above signal level, and a voice detecting section which compares the above reception signal or the above transmission signal with the above threshold to detect a voice.
Therefore, according to the present invention, a noise level estimate having trackability to an unstationary noise whose level changes constantly can be computed, the occurrence of misdetection of a voice can be reduced even under circumstances where the S/N ratio is low, the presence or absence of a voice can be detected more distinctly, and more accurate voice detection than the conventional voice switching system can be carried out by updating the threshold for voice detection according to the noise level estimate.
Further, the voice switching system of the present invention comprises a receiving-side voice detection processing section which carries out the voice detection processing of a reception signal, a transmitting-side voice detection processing section which carries out the voice detection processing of a transmission signal, a reception voice attenuator which attenuates the above reception signal, a transmission voice attenuator which attenuates the above transmission signal, and a loss controlling section which controls the losses of the above reception voice attenuator and the above transmission voice attenuator according to the results of the voice detection processings of the above receiving-side voice detection processing section and the above transmitting-side voice detection processing section. The above receiving-side voice detection processing section and the above transmitting-side voice detection processing section each comprises a signal level computing section which computes the amplitude level of the above reception signal or the above transmission signal in each predetermined sample or frame, a noise level estimating section which computes a noise level estimate from the signal outputted from this signal level computing section, a threshold computing section which computes a threshold for detecting a voice from the above noise level estimate, a voice detecting section which compares the above reception signal or the above transmission signal with the above threshold to detect a voice, and an updating amount setting section which sets the updating amount of the noise level estimate in the above noise level estimating section according to the result of the detection of this voice detecting section.
Therefore, according to the present invention, a noise level estimate having trackability to an unstationary noise whose level changes constantly can be computed, the occurrence of misdetection of a voice can be reduced even under circumstances where the S/N ratio is low, the presence or absence of a voice can be detected more distinctly, and more accurate voice detection having more trackability than the conventional voice switching system can be carried out by changing the updating amount of the noise level estimate according to the result of the voice detection.
Further, in the above threshold updating section of the voice switching system of the present invention, when the signal level computed in the above signal level computing section is higher than the value obtained by multiplying the noise level estimate computed in the above noise level estimating section by a predetermined constant, a judgment coefficient for setting the above threshold is set to be a predetermined small value; when the above signal level is lower than the value obtained by multiplying the above noise level estimate by the above predetermined constant, the above judgment coefficient is set to be a larger value progressively; and when the above judgment coefficient becomes larger than a predetermined judgment value, the above judgment factor is set to be the above predetermined judgment value.
Therefore, according to the present invention, highly accurate voice detection having trackability can be carried out by updating the threshold according to the signal level.
In the above updating amount setting section of the voice switching system of the present invention, when the signal level computed in the above signal level computing section is higher than the threshold computed in the above threshold computing section, it is determined that a voice is present, while when the above signal level is lower than the above threshold, it is determined that no voice is present, and the updating amount of the noise level estimate in the above noise level estimating section is changed according to the result of this voice detection.
Therefore, according to the present invention, highly accurate voice detection having trackability can be carried out by changing the updating amount of the noise level estimate according to the result of the voice detection.
The voice switching system of the present invention further comprises a counting section which counts the number of samples or frames or time after the activation of the system and an initialization performing section which performs the initialization of a noise level estimate for a predetermined time period, thereby performing the initialization of the noise level estimate for a predetermined time period after the activation of the system.
Therefore, according to the present invention, by performing the initialization of the noise level estimate for a predetermined time period after the activation of the system, not only the trackability to noise in the noise level estimate immediately after the activation of the system but also the performance of the voice switching system can be improved.
In the above noise level estimating section of the voice switching system of the present invention, when the above signal level is lower than the above noise level estimate, the above signal level is set to be the above noise level estimate, while when the signal level is higher than the noise level estimate, the noise level estimate is set to be a larger value progressively.
Therefore, according to the present invention, when the signal level is higher than the noise level estimate, the noise level estimate is set to be a larger value progressively, thereby improving the performance of the voice switching system properly.
The voice switching method of the present invention performs a receiving-side voice detection processing step in which the voice detection processing of a reception signal is carried out, a transmitting-side voice detection processing step in which the voice detection processing of a transmission signal is carried out, a reception voice attenuating step in which the above reception signal is attenuated, a transmission voice attenuating step in which the above transmission signal is attenuated, and a loss controlling step in which the losses of the above reception voice attenuating step and the above transmission voice attenuating step are controlled according to the results of the voice detection processings of the above receiving-side voice detection processing step and the above transmitting-side voice detection processing step. The above receiving-side voice detection processing step and the above transmitting-side voice detection processing step each performs a signal level computing step in which the amplitude level of the above reception signal or the above transmission signal is computed in each predetermined sample or frame, a noise level estimating step in which a noise level estimate is computed from the signal outputted from this signal level computing step, a threshold computing step in which a threshold for detecting a voice is computed from the above noise level estimate, and a voice detecting step in which the above reception signal or the above transmission signal is compared with the above threshold to detect a voice.
Therefore, according to the present invention, the estimation of a noise level estimate having trackability to an unstationary noise whose level changes constantly can be made, the occurrence of misdetection of a voice can be reduced even under circumstances where the S/N ratio is low, the presence or absence of a voice can be detected more distinctly, and more accurate voice detection than the conventional voice switching system can be carried out.
Further, the voice switching method of the present invention performs a receiving-side voice detection processing step in which the voice detection processing of a reception signal is carried out, a transmitting-side voice detection processing step in which the voice detection processing of a transmission signal is carried out, a reception voice attenuating step in which the above reception signal is attenuated, a transmission voice attenuating step in which the above transmission signal is attenuated, and a loss controlling step in which the losses of the above reception voice attenuating step and the above transmission voice attenuating step are controlled according to the results of the voice detection processings of the above receiving-side voice detection processing step and the above transmitting-side voice detection processing step. The above receiving-side voice detection processing step and the above transmitting-side voice detection processing step each performs a signal level computing step in which the amplitude level of the above reception signal or the above transmission signal is computed in each predetermined sample or frame, a noise level estimating step in which a noise level estimate is computed from the signal outputted from this signal level computing step, a threshold updating step in which a threshold for detecting a voice is computed from the above noise level estimate and updates the above threshold according to the above noise level estimate and the above signal level, and a voice detecting step in which the above reception signal or the above transmission signal is compared with the above threshold to detect a voice.
Therefore, according to the present invention, a noise level estimate having trackability to an unstationary noise whose level changes constantly can be computed, the occurrence of misdetection of a voice can be reduced even under circumstances where the S/N ratio is low, the presence or absence of a voice can be detected more distinctly, and more accurate voice detection than the conventional voice switching system can be carried out by updating the threshold for voice detection according to the noise level estimate.
Further, the voice switching method of the present invention performs a receiving-side voice detection processing step in which the voice detection processing of a reception signal is carried out, a transmitting-side voice detection processing step in which the voice detection processing of a transmission signal is carried out, a reception voice attenuating step in which the above reception signal is attenuated, a transmission voice attenuating step in which the above transmission signal is attenuated, and a loss controlling step in which the losses of the above reception voice attenuating step and the above transmission voice attenuating step are controlled according to the results of the voice detection processings of the above receiving-side voice detection processing step and the above transmitting-side voice detection processing step. The above receiving-side voice detection processing step and the above transmitting-side voice detection processing step each comprises a signal level computing step in which the amplitude level of the above reception signal or the above transmission signal is computed in each predetermined sample or frame, a noise level estimating step in which a noise level estimate is computed from the signal outputted from this signal level computing step, a threshold computing section in which a threshold for detecting a voice is computed from the above noise level estimate, a voice detecting step in which the above reception signal or the above transmission signal is compared with the above threshold to detect a voice, and an updating amount setting step in which the updating amount of the noise level estimate in the above noise level estimating step is set according to the result of the detection of this voice detecting step.
Therefore, according to the present invention, a noise level estimate having trackability to an unstationary noise whose level changes constantly can be computed, the occurrence of misdetection of a voice can be reduced even under circumstances where the S/N ratio is low, the presence or absence of a voice can be detected more distinctly, and more accurate voice detection having more trackability than the conventional voice switching system can be carried out by changing the updating amount of the noise level estimate according to the result of the voice detection.
Further, in the above threshold updating step of the voice switching method of the present invention, when the signal level computed in the above signal level computing step is higher than the value obtained by multiplying the noise level estimate computed in the above noise level estimating step by a predetermined constant, a judgment coefficient for setting the above threshold is set to be a predetermined small value; when the above signal level is lower than the value obtained by multiplying the above noise level estimate by the above predetermined constant, the above judgment coefficient is set to be a larger value progressively; and when the above judgment coefficient becomes larger than a predetermined judgment value, the above judgment coefficient is set to be the above predetermined judgment value.
Therefore, according to the present invention, highly accurate voice detection having trackability can be carried out by updating the threshold according to the signal level.
In the above updating amount setting step of the voice switching method of the present invention, when the signal level computed in the above signal level computing step is higher than the threshold computed in the above threshold computing step, it is determined that a voice is present, while when the above signal level is lower than the above threshold, it is determined that no voice is present, and the updating amount of the noise level estimate in the above noise level estimating step is changed according to the result of this voice detection.
Therefore, according to the present invention, highly accurate voice detection having trackability can be carried out by changing the updating amount of the noise level estimate according to the result of the voice detection.
The voice switching method of the present invention further performs a counting step in which the number of samples or frames or time after the activation of the system is counted and an initialization performing step in which the initialization of a noise level estimate is performed for a predetermined time period, thereby performing the initialization of the noise level estimate for a predetermined time period after the activation of the system.
Therefore, according to the present invention, by performing the initialization of the noise level estimate for a predetermined time period after the activation of the system, not only the trackability to noise in the noise level estimate immediately after the activation of the system but also the performance of the voice switching system can be improved.
In the above noise level estimating step of the voice switching method of the present invention, when the above signal level is lower than the above noise level estimate, the above signal level is set to be the above noise level estimate, while when the above signal level is higher than the above noise level estimate, the noise level estimate is set to be a larger value progressively.
Therefore, according to the present invention, when the above signal level is higher than the above noise level estimate, the noise level estimate is set to be a larger value progressively, thereby improving the performance of the voice switching system properly.