The present invention relates generally to signal processing techniques, and more particularly, to methods and apparatus for detecting noise in signals, such as audio signals.
Noise detection schemes have many applications in signal processing and signal analysis. A good noise detection measure can improve noise reduction algorithms. In the study of the properties of a signal, the detection of noise-like signal components can be an important part of the analysis. In modeling and control engineering applications, for example, the identification of noisy signal components can help find an optimal model structure or the identified noisy signal components can be used as input parameters for the model. In audio or image compression schemes, noise-like signal components do not need to be encoded and thus the number of encoded bits can be reduced. Only the parameters that are necessary to generate similar noise-like components are transmitted to the decoder. The decoder artificially generates similar noise-like components during the synthesis of the signal.
Currently available noise detection methods, such as those used in speech coders described, for example, in W. B. Kleijn and K. K. Paliwal, xe2x80x9cAn Introduction to Speech Coding,xe2x80x9d
Speech Coding and Synthesis, Amsterdam: Elsevier, (1995), incorporated by reference herein, are typically based on a spectral flatness measure. In a general application, however, such a measure can fail by detecting the flat spectrum of an impulse signal as noise.
The MPEG-2 AAC audio encoder described, for example, in ISO/JTC1 SC29 WG11, Final Draft International Standard FDIS 14496-3: Coding of Audiovisual Objects, Part 3: Audio (October 1998), detects a range of spectral samples as noisy if the spectral samples are not tonal and there are no strong changes in energy over time. The tonality of the signal is estimated by using the tonality values calculated using a psychoacoustic model. The noise detection method of the MPEG-2 AAC is tightly linked to the infrastructure of a specific audio coder.
Thus, the noise detection method of the MPEG-2 AAC cannot be applied generally and its flexibility for use in other implementations of audio coding is limited.
A need therefore exists for an improved method and apparatus that detect noise-like signal components within arbitrary regions of the time-frequency plane. A further need exists for a method and apparatus that detect noise-like signal components that does not detect pulses as noise. Yet another need exists for a method and apparatus that detect noise-like signal components with a non-flat spectral or temporal envelope.
Generally, a method and apparatus are disclosed for detecting noise-like signal components within arbitrary regions of the time-frequency plane. G time domain samples are processed to determine whether they are noise-like. Various transforms, such as a discrete cosine transform (DCT), with different spectral/temporal resolutions are applied. The flatness of the time domain samples, such as an estimate of the entropy, is compared to the flatness of the samples for each transform. If the computed flatness measures are about the same, the subband samples {t0, t1, . . . , tGxe2x88x921} are assumed to be noisy.
According to one aspect of the invention, noise-like signal components can be detected within a limited time interval and frequency range by decomposing the signal into N (possibly non-uniform) spaced subbands using a general filterbank. In each of the N subbands, the samples are grouped into blocks of a specific length G. To each of these groups of G subband samples in time {t0, t1, . . . , tGxe2x88x921}, a linear orthogonal transform is applied to obtain the frequency domain samples {f0, f1, . . . , fGxe2x88x921}. Then, the flatness of the time domain samples is compared to the flatness of the samples of the linear orthogonal transform. If the computed flatness measures are about the same, the time domain samples are assumed to be noisy.
According to another aspect of the invention, a filterbank with uniform frequency-tiling such as an MDCT filterbank deployed in a perceptual audio coder (PAC), is used to detect noise-like signal components. Within the discrete representation of the time-frequency plane by the filterbank coefficients, noise detection partitions with appropriate time-frequency ranges can be chosen. A given noise detection partition with the size F over frequency and T over time contains the samples Sk,i(0xe2x89xa6k less than F, 0xe2x89xa6i less than T). To detect noise with a bandwidth of a given noise detection partition, two linear transforms are applied to the coefficients within the partition. A linear orthogonal synthesis transform, such as an inverse DCT transform, is applied over frequency in a noise detection partition to yield coefficients with maximum time resolution {t0, t1, . . . , tFTxe2x88x921}. A linear orthogonal analysis transform, such as a DCT transform, is applied within the noise detection partition over time to yield the highest possible frequency resolution coefficients {f0, f1, . . . , fFTxe2x88x921}. The mapping from t0,t1, . . . , tTFxe2x88x921, to f0,f1, . . . , fFTxe2x88x921, provides the longest possible time-frequency transform within the noise detection partition. The flatness of the time domain samples {tk} is compared to the flatness of the frequency domain samples {fi} to decide whether the frequency noise detection partition is noise-like.
According to another aspect of the invention, noise with a non-flat spectrum can also be detected. The signal is pre-processed according to its inverse spectral envelope before detecting noise-like signal components with a non-flat spectral or temporal envelope. The spectral coefficients of the filterbank are scaled before applying the noise-detection measure. By scaling the coefficients with a coarse approximation of their spectral envelope {Sk} prior to the detection, noise-like signal components with a non-flat spectral/temporal envelope can be detected using the condition for noise with a flat spectral/temporal envelope. In an audio coder implementation, for example, this feature can be implemented by scaling the spectral coefficients according to the perceptual model prior to the noise detection, since the masked threshold is roughly proportional to the spectral envelope of the signal.