The present invention relates generally to the field of communications with possible uses in voice recognition, public address, and recording. The present invention relates more particularly to a signal expander system that can distinguish sounds from acoustic sources placed at different locations relative to a transducer.
Conventional voice communication systems typically incorporate some form of voice activated switch or signal expander for suppressing acoustic background noises. FIG. 1A illustrates a functional block diagram of a conventional bi-stable signal expander system 100 comprising a microphone 105, an expander control stage 110, and a variable gain block (stage) 115. The microphone 105 is coupled to the variable gain block 115, and the expander control stage 110 is coupled to the microphone 105 and the variable gain block 115. The expander control stage 110 includes a detector 120 coupled between the microphone 105 and a first input 125 of a comparator 130. The comparator 130 has a second input 135 coupled to a reference (threshold) voltage source 140 for generating a reference voltage level Vref.
When an acoustic source 145 becomes active, the emitted sounds from acoustic source 145 will cause changes in the air pressure. The microphone 105 detects the air pressure changes and translates the air pressure changes into corresponding voltage changes (i.e., microphone output signals 150) that are detected by the detector 120. The detector 120 outputs the microphone output signal 150 as a detector output signal 155. The comparator 130 compares the voltage level of the detector output signal 155 with the reference voltage level Vref from reference voltage source 140. If the voltage level of the detector output signal 155 is below the reference voltage level Vref, then the comparator 130 generates an output signal 160 with a logical state that does not activate the variable gain block 115. As a result, the variable gain block 115 does not add gain to the microphone output signal 150. When the acoustic source 145 activates, the voltage level of the microphone output signal 150 rises. The detector 120 detects the higher-level microphone output signal 150 and will, as a result, output a higher-level detector output signal 155. If the voltage level of detector output signal 155 rises above the reference voltage level Vref, the comparator 130 outputs a control signal 160 with a logic state that causes the variable gain block 115 to add gain to the microphone output signal 150. The variable gain block 115 then outputs the amplified microphone output signal as an audio output signal 165.
A disadvantage of the bi-stable signal expander system 100 is that a low-level sound (e.g., soft speech) from the acoustic source 145 may not be amplified if the magnitude of the low level sound does not trigger the comparator 130. If the threshold of comparator 130 is set too low, then noise signals in the environment will easily trigger the comparator 130, thereby activating the variable gain block 115 and amplifying the noise signals (xe2x80x9cnoise pumpingxe2x80x9d). If the threshold of comparator 130 is set too high, then softer sounds from the acoustic source 145 may not trigger the comparator 130 and not activate the variable gain block 115, resulting in inadequate gain for the desired signal.
In order to reduce the consequence of low level speech failing to activate the comparator 130, some systems set the minimum gain of the variable gain block 115 to be only 12 dB to 20 dB below the maximum (fully activated) gain.
A disadvantage of the above conventional bi-stable signal expander systems is that ambient or undesired noises having magnitudes above the threshold level of comparator 130 are amplified. Additionally, if the bi-stable expander system 100 will be used in a noisy environment, the threshold of the comparator 130 must be set appropriately so that the external noises do not trigger the comparator 130, increasing the severity of noise modulation and increasing the likelihood that the system will not respond properly to voice.
Additionally, if the microphone 105 is oriented away from the acoustic or speech source 145, the microphone 105 may not be able to properly detect the desired sound waves. As a result, the microphone output voltage level 150 will be low and may not trigger the comparator 130 to permit amplification of the desired detected sound.
The above-mentioned bi-stable signal expander systems have a fast attack and slow decay characteristic that causes the switches for controlling gain to respond quickly to a detected sound of a sufficient voltage level and to maintain the gain for a pre-defined time length (e.g., 150 ms to 200 ms) after the voltage level of the detected sound falls below the comparator 130 threshold. By maintaining the gain for the additional pre-defined time length, the quieter-sounding, trailing ends of the speech envelope are not cut off by the bi-stable signal expander system. These trailing ends are typically below the comparator threshold. However, noise is often amplified during the additional pre-defined time length when gain is maintained.
The above-mentioned bi-stable signal expander systems also encounter problems when a burst of background noise occurs. For example, the noise burst might be a typewriter key impact or other types of noises with impulse waveforms. The noise burst will trigger the comparator 130 in the bi-stable signal expander system, thereby adding gain to the undesired noise burst. In addition, since the gain is maintained for the above-mentioned pre-defined time length after the noise burst occurrence, subsequent undesired noises are also amplified.
FIG. 1B illustrates a conventional variable gain signal expander system 200 including an expander control stage 205 coupled between the microphone 105 and the variable gain block 115. The expander control stage 205 includes a detector 210 coupled to an amplifier 215. The microphone output signal 150 is detected by the detector 210 and amplified by the amplifier 215. As a result, the amplifier 215 generates a control signal 220 with a magnitude that depends on the initial magnitude of the microphone output signal 150. The amount of gain provided by the variable gain block 115 to the microphone output signal 150 depends on the magnitude of the control signal 220.
The variable gain signal expander system 200 can be designed with a shorter time constant for reduced audibility of xe2x80x9cnoise pumpingxe2x80x9d. The shorter time constant reduces the amount of time that high gain is applied to the noise signal as the desired signal drops in amplitude. The effect of the combination of variable gain with reduced time constants on the desired signal is to modulate the envelope of the speech signal. This is not generally desirable but may be an acceptable compromise in noisy environments.
A further disadvantage of the variable gain signal expander system 200 is that both the signal from the desired acoustic source and the ambient acoustic noise are detected and used to increase the gain of the variable gain block 115. Thus as the ambient noise levels increase, the gain of the system for this noise can also increase, resulting in less overall noise reduction.
Accordingly, it is desirable to provide a method and system for signal expansion with improved noise rejection capability.
The present invention provides a desirable method and system for discriminating between desired sounds from a near-field acoustic source and sounds (noise) from far-field acoustic sources. The invention advantageously prevents gain from being added to the undesired (far-field) loud noises while allowing gain to be added to low-level sounds from the desired (nearby) acoustic source. As a result, the invention can reduce the xe2x80x9cnoise pumpingxe2x80x9d problem that is encountered by conventional systems and methods.
The present invention can be used to enhance the noise rejection capability of headsets. Additionally, the invention may be used in other applications, such as handsets, as long as two microphones can be placed at different distances from an acoustic source. The invention may also be used to enhance the noise rejection capability of voice recognition, public address, and recording systems.
In one embodiment of the present invention, the signal expander system comprises an input block, a proximity estimation block (e.g., a ratio detector) coupled to the input block and a variable gain block coupled to the proximity estimation block. A speech activity detector may be optionally coupled to the proximity estimation block.
The input block has at least two inputs (such as two microphones) and three outputs (such a signal output and the individual outputs of each microphone). In one embodiment, the signal output of the input block is the same as the output of the first microphone. In another embodiment, each microphone output is coupled to a corresponding sensitivity adjustment block for closely matching the sensitivities of the microphones. Thus, the outputs of the microphones are derived from their corresponding sensitivity adjustment block. The signal output of the input block is derived from the difference between the outputs of the first microphone and the second microphone. This second embodiment results in a composite directional microphone from the microphone pair in the input block. This embodiment is essentially independent of the proximity issue between the microphones and the acoustic source, and is a convenient by-product of the above two microphone topology. The delay in the output signal from the second microphone alters the polar pattern from bi-directional to cardioid, supercardioid, or hypercardioid, depending on the amount of delay.
The proximity estimation block includes two inputs (from the microphones) and one output for generating the proximity estimation block output signal. As described below, differing degrees of signal conditioning may be applied in series with the input and output paths of the proximity estimation block. The proximity estimation (i.e., the distance between each microphone in the input block and the acoustic source) is made, for example, based on the ratio between the two inputs of the proximity estimation block. However, other methods may also be used to make the proximity estimation.
In one embodiment of the proximity estimation block, the first input permits the output signal of the first microphone to be received by one input of the divider, while the second input permits the output signal of the second microphone to be received by the other input of the divider. The output signal of the proximity estimation block is derived from the divider output signal.
Two types of signal conditioning blocks may be interposed on the input side of the divider. For example, filters may coupled to the divider input side to band-limit the signals received by the divider. Rectifying detectors may also be coupled to the divider input side to simplify the operation of the divider.
In one implementation, the divider is configured to take the difference of the logarithm of the divider input signals. This implementation is particularly suited to an analog implementation. The output of this implementation is the logarithm of the ratio. In some applications, the antilog of this ratio value may be taken. However, direct use of the log output frequently will result in better noise margins and permits use of simpler circuitry.
On the output side of the divider, a clipping and smoothing function may be inserted to reduce the effects of phase induced transients. This function may be implemented in either analog or digital form.
The variable gain block includes two inputs and one output and may be implemented in three basic forms, as described below. The first input of the variable gain block is the output signal from the input block. The second input of the variable gain block is from either the direct output of the proximity estimation block or from a speech activity detector having an output derived from the proximity estimation. The output of the variable gain block is the system audio output. All implementations of the variable gain block typically includes a variable gain amplifier or an attenuator, either of which is controlled directly or indirectly by the inputs to the proximity estimation block and may include components for controlling the timing of the gain changes. All implementations of the variable gain block may include a delay element on the block input to compensate for any delays introduced by the proximity estimation process.
One implementation of the variable gain block includes a conventional amplitude based expander where the gain of the variable gain block is determined by an input level when the proximity estimation indicates that the distance to the acoustic source is within a proscribed distance. Thus, the gain applied to the output of the first microphone is determined by both the level of the output signal itself and by the estimate of proximity (as determined by the proximity estimation block) exceeding a threshold value. The proximity estimate input to this implementation is binary regardless of the gain versus input level characteristics of the expander when activated.
A second implementation of the variable gain block includes a form of bi-stable expander that assumes a high or low gain state depending on the binary value of the proximity based speech activity detector. Thus, the output signal of the first microphone is amplified in response to the proximity estimate exceeding a threshold value.
A third implementation of the variable gain block includes a variable gain element providing a gain that is a function of the direct output of the proximity estimation block. The function may be: (i) linear, (ii) non-linear, (iii) logarithmic, or (iv) arbitrary. Thus, the gain provided to the output signal of the first microphone is a function of the direct output of the proximity estimation block. The second input of the variable gain block in this implementation is directly coupled to the output of the proximity estimation block.
The speech activity detector includes one input and one output. The detector input is connected to the output of the proximity estimation block and permits transmission of the proximity estimation block output signal to one input or a comparator. The other input of the comparator is a proximity reference value that establishes the limit of the distance within which the desired acoustic source is expected to be located. All acoustic sources more distant than this value will be considered as noise. The output of the comparator is the output of the speech activity detector and provides a binary indication of whether the desired acoustic source is active.
All embodiments of the present invention typically incorporate all four blocks (input block, proximity estimation block, variable gain block, and speech activity detector) in one of two general configurations depending on whether the variable gain block accepts the proximity estimate directly or uses the speech activity detector output. Any of the variations of the input block or the proximity estimation block can be optionally used for any embodiment disclosed herein. The speech activity detector is typically the same form in all embodiments. The variable gain block can be any of the three implementations mentioned above.
Additionally, the present invention provides a signal expander using digital processing to implement the functions mentioned above. In this embodiment of the invention, the output signals from the microphones in the input block are first digitized. Analog processes are replaced by digital arithmetic blocks and algorithms. This digital implementation also enables more sophisticated processes to be utilized without requiring additional hardware.