1. Field of the Invention
The present invention relates to the field of digital signal processing, and more specifically, to a spectral noise reduction method and apparatus that can be used to remove the noise typically associated with analog signal environments.
2. Description of the Related Art
When an analog signal contains unwanted additive noise, enhancement of the perceived signal-to-noise ratio before playback will produce a more coherent, and therefore more desirable, signal. An enhancement process that is single-ended, that is, one that operates with no information available at the receiver other than the noise-degraded signal itself, is preferable to other methods. The reason it is preferable is because complementary noise reduction schemes, which require cooperation on the part of the broadcaster and the receiver, require both the broadcaster and the receiver to be equipped with encoding and decoding gear, and the encoding and decoding levels must be carefully matched. These considerations are not present with single-ended enhancement processes.
A composite “noisy” signal contains features that are noise and features that are attributable to the desired signal. In order to boost the desired signal while attenuating the background noise, the features of the composite signal that are noise need to be distinguished from the features of the composite signal that are attributable to the desired signal. Next, the features that have been identified as noise need to be removed or reduced from the composite signal. Lastly, the detection and removal methods need to be adjusted to compensate for the expected time-variant behavior of the signal and noise.
Any single-ended enhancement method also needs to address the issue of signal gaps—or “dropouts”—which can occur if the signal is lost momentarily. These gaps can occur when the received signal is lost due to channel interference (for example, lightning, cross-talk, or weak signal) in a radio or transmission or decoding errors in the playback system. The signal enhancement process must detect the signal dropout and take appropriate action, either by muting the playback or by reconstructing an estimate of the missing part of the signal. Although muting the playback does not solve the problem, it is often used because it is inexpensive to implement, and if the gap is very short, it may be relatively inaudible.
Several single-ended methods of reducing the audibility of unwanted additive noise in analog signals have already been developed. These methods generally fall into two categories: time-domain level detectors and frequency-domain filters. Both of these methods are one-dimensional in the sense that they are based on either the signal waveform (amplitude) as a function of time or the signal's frequency content at a particular time. By contrast, and as explained more fully below in the Detailed Description of Invention section, the present invention is two-dimensional in that it takes into consideration how both the amplitude and frequency content change with time.
Accordingly, it is an object of the present invention to devise a process for improving the signal-to-noise ratio in audio signals. It is a further object of the present invention to develop an intelligent model for the desired signal that allows a substantially more effective separation of the noise and the desired signal than current single-ended processes. The one-dimensional (or single-ended) processes used in the prior art are described more fully below, as are the discrete Fourier transform and Fourier transform magnitude—two techniques that play a role in the present invention.
A. Time-Domain Level Detection
The time-domain method of noise elimination or reduction uses a specified signal level, or threshold, that indicates the likely presence of the desired signal. The threshold is set (usually manually) high enough so that when the desired signal is absent (for example, when there is a pause between sentences or messages), there is no hard hiss. The threshold, however, must not be set so high that the desired signal is affected when it is present. If the received signal is below the threshold, it is presumed to contain only noise, and the output signal level is reduced or “gated” accordingly. As used in this context, the term “gated” means that the signal is not allowed to pass through. This process can make the received signal sound somewhat less noisy because the hiss goes away during the pause between words or sentences, but it is not particularly effective. By continuously monitoring the input signal level as compared to the threshold level, the time-domain level detection method gates the output signal on and off as the input signal level varies. These time-domain level detection systems have been variously referred to as squelch control, dynamic range expander, and noise gate.
In simple terms, the noise gate method uses the amplitude of the signal as the primary indicator: if the input signal level is high, it is assumed to be dominated by the desired signal, and the input is passed to the output essentially unchanged. On the other hand, if the received signal level is low, it is assumed to be a segment without the desired signal, and the gain (or volume) is reduced to make the output signal even quieter.
The difference between the time-domain methods and the present invention is that the time-domain methods do not remove the noise when the desired signal is present. Instead, if the noisy signal exceeds the threshold, the gate is opened, and the signal is allowed to pass through. Thus, the gate may open if there is a sudden burst of noise, a click, or some other loud sound that causes the signal level to exceed the threshold. In that case, the output signal quality is good only if the signal is sufficiently strong to mask the presence of the noise. For that reason, this method only works if the signal-to-noise ratio is high.
The time-domain method can be effective if the noisy input consists of a relatively constant background noise and a signal with a time-varying amplitude envelope (i.e., if the desired signal varies between loud and soft, as in human speech). Changing the gain between the “pass” (or open) mode and the “gate” (or closed) mode can cause audible noise modulation, which is also called “gain pumping.” The term “gain pumping” is used by recording engineers and refers to the audible sound of the noise appearing when the gate opens and then disappearing when the gate closes. Furthermore, the “pass” mode simply allows the signal to pass but does not actually improve the signal-to-noise ratio when the desired signal is present.
The effectiveness of the time-domain detection methods can be improved by carefully controlling the attack and release times (i.e., how rapidly the circuitry responds to changes in the input signal) of the gate, causing the threshold to vary automatically if the noise level changes, and splitting the gating decision into two or more frequency bands. Making the attack and release times somewhat gradual will lessen the audibility of the gain pumping, but it does not completely solve the problem. Multiple frequency bands with individual gates means that the threshold can be set more optimally if the noise varies from one frequency band to another. For example, if the noise is mostly a low frequency hum, the threshold can be set high enough to remove the noise in the low frequency band while still maintaining a lower threshold in the high frequency ranges. Despite these improvements, the time-domain detection method is still limited as compared to the present invention because the noise gate cannot distinguish between noise and the desired signal, other than on the basis of absolute signal level.
FIG. 1 is a flow diagram of the noise gate process. As shown in this figure, the noisy input 10 passes through a level detector 20 and then to a comparator 30, which compares the frequency level of the noisy input 10 to a pre-set threshold 40. If the frequency level of the noisy input 10 is greater than the threshold 40, then it is presumed to be a desired signal, the signal is passed through the gain-controlled amplifier (or gate) 50, and the gain is increased to make the output signal 60 even louder. If the frequency level of the noisy input 10 is less than the threshold 40, then it is presumed to constitute noise, and the signal is passed to the gain-controlled amplifier 50, where the gain is decreased to make the output signal 60 even quieter. If the signal is below the threshold level, it does not pass through the gate.
B. Frequency-Domain Filtration
The other well-known procedure for signal enhancement involves the use of spectral subtraction in the frequency domain. The goal is to make an estimate of the noise power as a function of frequency, then subtract this noise spectrum from the input signal spectrum, presumably leaving the desired signal spectrum.
For example, consider the signal spectrum shown in FIG. 2. The graph shows the amplitude, or signal energy, as a function of frequency. This example spectrum is harmonic, which means that the energy is concentrated at a series of discrete frequencies that are integer multiples of a base frequency (also called a “fundamental”). In this example, the fundamental is 100 Hz; therefore, the energy consists of harmonic partials, or harmonic overtones, at 100, 200, 300, etc. Hz. A signal with a harmonic spectrum has a specific pitch, or musical tone, to the human ear.
The example signal of FIG. 2 is intended to represent the clean, noise-free original signal, which is then passed through a noisy radio channel. An example of the noise spectrum that could be added by a noisy radio channel is shown in FIG. 3. Note that unlike the discrete frequency components of the harmonic signal, the noise signal in FIG. 3 has a more uniform spread of signal energy across the entire frequency range. The noise is not harmonic, and it sounds like a hiss to the human ear. If the desired signal of FIG. 2 is now sent through a channel containing additive noise distributed as shown in FIG. 3, the resulting noisy signal that is received is shown in FIG. 4, where the dashed line indicates the noise level.
In a prior art spectral subtraction system, the receiver estimates the noise level as a function of frequency. The noise level estimate is usually obtained during a “quiet” section of the signal, such as a pause between spoken words in a speech signal. The spectral subtraction process involves subtracting the noise level estimate, or threshold, from the received signal so that any spectral energy that is below the threshold is removed. The noise-reduced output signal is then reconstructed from this subtracted spectrum.
An example of the noise-reduced output spectrum for the noisy signal of FIG. 4 is shown in FIG. 5. Note that because some of the desired signal spectral components were below the noise threshold, the spectral subtraction process inadvertently removes them. Nevertheless, the spectral subtraction method can conceivably improve the signal-to-noise ratio if the noise level is not too high.
The spectral subtraction process can cause various audible problems, especially when the actual noise level differs from the estimated noise spectrum. In this situation, the noise is not perfectly canceled, and the residual noise can take on a whistling, tinkling quality sometimes referred to as “musical noise” or “birdie noise.” Furthermore, spectral subtraction does not adequately deal with changes in the desired signal over time, or the fact that the noise itself will generally fluctuate rapidly from time to time. If some signal components are below the noise threshold at one instant in time but then peak above the noise threshold at a later instant in time, the abrupt change in those components can result in an annoying audible burble or gargle sound.
Some prior art improvements to the spectral subtraction method have been made, such as frequently updating the noise level estimate, switching off the subtraction in strong signal conditions, and attempting to detect and suppress the residual musical noise. None of these techniques, however, has been wholly successful at eliminating the audible problems.
C. Discrete Fourier Transform and Fourier Transform Magnitude
The discrete Fourier transform (“DFT”) is a computational method for representing a discrete-time (“sampled” or “digitized”) signal in terms of its frequency content. A short segment (or “data frame”) of an input signal, such as a noisy audio signal treated in this invention, is processed according to the well-known DFT analysis formula (1):
      X    ⁢                  [    k    ]    =            ∑              n        =        0                    N        -        1              ⁢                  x        ⁢                                  [        n        ]            ⁢                          ⁢              ⅇ                              -            j                    ⁢                                          ⁢          2          ⁢          π          ⁢                                          ⁢                      nk            /            N                              where N is the length of the data frame, x[n] are the N digital samples comprising the input data frame, X[k] are the N Fourier transform values, j represents the mathematical imaginary quantity (square-root of −1), e is the base of the natural logarithms, and ejθ=cos(θ)+j·sin(θ), which is the relationship known as Euler's formula.
The DFT analysis formula expressed in equation (1) can be interpreted as producing N equally-spaced samples between zero and the digital sampling frequency for the signal x[n]. Because the DFT formula involves the imaginary number j, the X[k] spectral samples will, in general, be mathematically complex numbers, meaning that they will have a “real” part and an “imaginary” part.
The inverse DFT is computed using the standard inverse transform, or “Fourier synthesis” equation (2):
      x    ⁢                  [    n    ]    =            ∑              k        =        0                    N        -        1              ⁢                  X        ⁢                                  [        k        ]            ⁢                          ⁢              ⅇ                              +            j                    ⁢                                          ⁢          2          ⁢          π          ⁢                                          ⁢                      nk            /            N                              
Equation 2 shows that the data frame x[n] can be reconstructed, or synthesized, from the DFT data X[k] without any loss of information: the signal can be reconstructed from its Fourier transform, at least within the limits of numerical precision. This ability to reconstruct the signal from its Fourier transform allows the signal to be converted from the discrete-time domain to the frequency domain (Fourier) and vice versa.
In order to estimate the signal power in a particular range of frequencies, such as when attempting to distinguish between the background noise and the desired signal, this information can be obtained by calculating the spectral magnitude of the DFT by the standard Pythagorean formula (3):
  magnitude  =                          X        ⁢                                  [        k        ]                    =                                        {                          Re              ⁢                                                          ⁢                              (                                  X                  ⁢                                                                          [                  k                  ]                                )                                      }                    2                +                              {                          Im              ⁢                                                          ⁢                              (                                  X                  ⁢                                                                          [                  k                  ]                                )                                      }                    2                    where Re( ) and Im( ) indicate taking the mathematical real part and imaginary part, respectively. Although the input signal x[n] cannot, in general, be reconstructed from the DFT magnitude, the magnitude information can be used to find the distribution of signal power as a function of frequency for that particular data frame.