The present invention relates to voice processing and, more particularly, to voice processing for telecommunication and other narrow dynamic range voice communication systems, where it is desired to maximize and stabilize the level of received or transmitted speech.
Speech signals are characterized by a large dynamic range and a high peak-to-RMS ratio. In telephony applications, especially cellular telephony, this dynamic range is further exaggerated due to variations in relative positioning of the talker and microphone, variations of voice level between different talkers, variations in same talker""s voice level, and similar variable factors.
On the other hand, the reproduction circuitry of common telephones are limited in their dynamic range, especially for cellular phones where supply voltage, physical size, and cost are a further limiting factor. An extreme example is the cellular/mobile speakerphone, where despite severe restrictions on the dynamic range, satisfactory listening requires a minimum loudness level from the speakerphone.
There are three major problems associated with dynamic range in telephony applications:
the high peak-to-RMS ratio naturally characteristic of speech signals;
the wide variations in RMS level of speech signals; and
the high level of background noise typically encountered in telephony applications.
Treating the speech signal by applying signal processing techniques can significantly improve the situation, but as will be explained, each of the above problems requires a different treatment in order to achieve optimal results.
Various methods and circuits are known in the prior art for automatic gain control (AGC), such as disclosed in U.S. Pat. No. 5,854,845 to Itani, U.S. Pat. No. 5,832,444 to Schmidt, and U.S. Pat. No. 5,838,194 to Khoury.
None of the prior art methods, however, fully address all of the above-mentioned problems, but provide for only limited improvements. While such limited improvements may have been sufficient for traditional telephony, they are not adequate to deal with the more severe problems posed by cellular/mobile telephony.
The automatic gain control methods described in the prior art incorporate an input-output target response that allows no dynamic variation in the signal range (the target output level is constant except when noise is detected). This is a highly unnatural output target response, which forces using a slow response time in order to permit some reasonable dynamics of the processed speech signal. Another factor that slows down the response is the need to decrease the amount of audible distortion introduced into the signal.
On the other hand, controlling signal peaks demands a short response time, and therefore controlling signal peaks is not possible when a slow response time is used. For this reason, prior art methods fail to reduce the high peak-to-RMS ratio of speech, and sometimes even increase it, as can be seen for example in Khoury (U.S. Pat. No. 5,838,194), FIGS. 13 and 14.
Another limitation of prior art methods is that they deal with both the AGC of the speech signal, as well as with the muting or reduction of background noise traveling through the same signal path. This forces a single method of detecting the envelope; the response times involved, and the smoothing applied, to deal with significantly different problems. Because of this, the performance of such methods is degraded.
There is thus a widely recognized need for, and it would be highly advantageous to have, a means of that can simultaneously of automatically maximizing and stabilizing speech signals that can handle a high peak-to-RMS ratio, wide variations in RMS level, and high levels of background noise. This goal is met by the present invention.
It is an object of the present invention to provide a method for processing speech signals that provides an optimal solution for all the previously-mentioned problems, and allows for maximizing and stabilizing the RMS voice level, significantly reducing the peak-to-RMS ratio, and avoiding excessive amplification of background noise.
The present invention discloses several novel concepts that significantly alleviate the above mentioned problems. These novel concepts are:
1. Implementing more than one control path in a parallel fashion, where each path addresses a different problem. This permits a separate optimizing of the envelope estimation method and response time, control curves, and the control signal smoothing method and response time.
2. Utilizing a family of input-output control curves suitable for both peak limiting and dynamic range compression (herein denoted as compressor-limiter control curves).
3. Utilizing a family of input-output control curves suitable for preventing excessive amplification of background noise. (herein denoted as low-level expander control curves).
4. Delaying the audio path to allow for look-ahead in the control path.
5. Utilizing finite impulse response (FIR) filtering smoothing matched to the look-ahead.
6. Utilizing digital domain peak-interpolators for estimating the peaks of the corresponding input signal in the continuous time domain.
FIG. 1 shows a generic block diagram according to the invention and illustrates the use of multiple parallel control paths. Signal paths 22 and 24 carry at least one signal and implement the parallel control path principle of the present invention. An input signal 10 is fed in parallel into a delay line 18 and an envelope extractor 12. Envelope extractor 12 extracts at least one envelope of input signal 10. More than one envelope will be extracted for a multiple parallel control path. The envelopes are fed via signal path 22 to a control gain calculator 14. Control gain calculator 14, calculates a gain for each envelope, and the gains are fed via signal path 24 to a control smoother 16, which smoothes the gains and delivers a single consolidated output control gain, which is in turn applied to the output of the delay line at an output multiplier 20 to produce the final output.
According to the present invention there is provided a voice signal processing system receiving an input signal characterized by at least one envelope, the system having an output signal and a specified maximum gain, the system including: (a) an input for receiving the input signal; (b) an envelope extractor coupled to the input for extracting at least one envelope of the input signal; (c) a control gain calculator coupled to the envelope extractor, for calculating at least one gain, wherein the control gain calculator is operative to performing a calculation of the group containing: (i) a first gain calculation of       1                  1        Gain            +                        (                      1            -                          1              Gain                                )                ⋆        Env              ,
wherein Env is an envelope and Gain is the specified maximum gain; (ii) a second gain calculation of the minimum of 1.0 and Env/Th, wherein Th is a threshold below which the result of the second gain calculation decreases; and (iii) the product of the first gain calculation and the second gain calculation; (d) a control smoother coupled to the control gain calculator for smoothing the at least one gain, and for delivering as the output signal a control gain; (e) a delay line coupled to the input, to produce a delayed input; and (f) an output multiplier coupled to the delay line and to the control smoother for applying the control gain to the delayed input.