The present invention relates to audio signal processing and, in particular, to an apparatus and method for improving the perceived quality of sound reproduction by combining Active Noise Cancellation and Perceptual Noise Compensation, e.g., by improving the perceived quality of reproduction of sound over headphones.
Audio signal processing becomes more and more important. In many listening scenarios, e.g., in a cabin of a vehicle, the audio signals are presented in a noisy environment and thereby, their sound quality and intelligibility is affected. One approach to reduce the impact of environmental noise on the listening experience is Active Noise Cancellation (Active Noise Control) see, e.g., [1], [2]. ANC (ANC=Active Noise Cancellation) reduces the interfering noise at the receiver side to varying degree. In general, low-frequency noise components can be canceled more successfully than high-frequency components, and stationary noise can be canceled better than non-stationary, and pure tone better than random noise.
Active Noise Cancellation is a technique to suppress acoustic noise based on the principle of acoustic interference. The basic idea of canceling the interfering noise by using a phase-inverted copy of it has first been described in Paul Lueg's patent in 1936, see [7].
The principles of ANC are summarized in [1] and [2]. The sound field emitted by the noise source (primary source) is measured using a transducer. This reference signal is used to generate a secondary signal which is fed into a secondary loudspeaker. If the acoustic wave emitted by the secondary source (the so-called “anti-noise”) is exactly out of phase with the acoustic wave of the noise, the noise is canceled due to destructive interference in the region behind the loudspeaker and opposite the noise source, the “zone of quiet”. Ideally, plane wave transducers are used for both, microphone and loudspeaker.
Although the anti-noise can be generated by delaying and scaling the measurement of the primary noise, the anti-noise is often computed adaptively to cope with possible variations in the acoustic path between noise and anti-sound transducer. Such implementations are based on adaptive filters whose filter coefficients are computed by minimizing an error signal using the Least-Mean Square (LMS), filtered-X LMS algorithm (FXLMS), leaky FXLMS or other optimization algorithms.
ANC can be implemented as either feedforward control or feedback control.
FIG. 3 illustrates a block diagram of an ANC implementation with feedforward structure. A noise source 310 emits primary noise 320. The primary noise 320 is recorded by a reference microphone 330 as an environmental audio signal d(t). The environmental audio signal is fed into an adaptive filter 340. The adaptive filter is configured to filter the environmental audio signal d(t) to obtain a filtered signal. The filtered signal is employed to steer a loudspeaker 350.
As already stated, the structure illustrated by FIG. 3 is a feedforward structure. In a feedforward structure, the referenced microphone may, e.g., be placed such that the primary noise is picked up before it reaches the secondary source, as shown in FIG. 3.
Often, a second microphone is mounted after the secondary source to measure the residual noise signal. In such a structure, the second microphone represents a residual noise microphone or an error microphone. Such a structure is shown in FIG. 4.
FIG. 4 illustrates a block diagram of an ANC implementation with feedforward structure with an additional error microphone 460. An adaptive algorithm computes the filter coefficients for generating the anti-noise using the referenced microphone signal such that the residual noise is minimized.
FIG. 5 illustrates a block diagram of an ANC implementation with feedback structure. Implementations in feedback structures, as shown in FIG. 5 use only one microphone for measuring the error and generating the secondary signal. A feedback ANC system for headphone application is described in [8].
The effect of the cancellation depends on the accuracy of the superposition of the sound fields of the noise source and the secondary source. In practice, the interfering noise signal is not removed completely. ANC is especially suitable for low-frequency noise signal components and stationary signals, but fails to remove high-frequency and non-stationary noise signal components.
Perceptual Noise Compensation (PNC) is a signal processing method to compensate for the perceptual effects of interfering noise by using psychoacoustic knowledge. The basic principle behind PNC is to apply time-varying equalization such that spectral components of the input audio signal are amplified which are masked by the interfering noise. The main idea has been referred to as e.g. Noise Compensation, see, e.g., [3], Masking Compensation, see, e.g., [4], Sound Equalization in Noisy Environments, see, e.g., [5], or Dynamic Sound Control, see, e.g., [6].
Perceptual Noise Compensation processes an audio signal such that its timbre and loudness, when presented in environmental noise, is perceived as similar or close to those when presented unprocessed in quiet. The additive noise leads to a decrease of the loudness of the desired signal due to partial or total masking effects. The resulting sensation is known as partial loudness. Due to the frequency selective processing in the human auditory system, the interfering noise effects the perceived spectral balance of the desired signal and thereby its timbre.
The basic principles of PNC have been applied, e.g. in [3]. Recent developments have, for example, been described in [9], [10], [11] and [6]. The rationale of the method is to apply time-varying spectral weighting factors to the desired signal such that the sensation of loudness and timbre is restored.
The spectral weighting method of the PNC splits the input audio signal into M frequency bands, advantageously according to a perceptually motivated frequency scale, having the bandwidth of a critical band, e.g. the Bark or ERB scale. The derived sub-band signals sm[k] are scaled with time-varying gain factors gm[k], with sub-band index m=1 . . . M and time index k. The gains are computed such that the partial specific loudness N′, e.g., the loudness evoked at each auditory frequency band, of the processed signal in noise are equivalent to the specific loudness of the unprocessed audio signal in quiet or a fraction β thereof, as shown in Equation (1), with em[k] being the sub-band signals of the additive noise:βN′q[m,k]=N′p[m,k]  (1)whereinN′q[m,k]=f(sm[k])is the loudness in quiet, and whereinN′p[m,k]=f(gm[k]sm[k]em[k])is the partial loudness of the processed signal in noise e[k].
Loudness models compute the partial specific loudness N′ [m, k] of a signal s[k] when presented simultaneously with a masking signal e[k].
The gains gm[k] can be computed using a model of partial loudness, see, for example [10].
In the following, reference is made to computational models of partial loudness. Loudness models compute the partial specific loudness N′(sm[k]+em[k]) of a signal s[k] when presented simultaneously with a masking signal e[k]:N′[m,k]=f(sm[k],em[k])  (2)
A particular implementation of a perceptual model of partial loudness is shown in FIG. 6. It is derived from the models presented in [12] and [13] which itself drew on earlier research by Fletcher, Munson, Stevens, and Zwicker with some modifications. Alternative methods for the calculation of the specific loudness have been developed in the past, as, e.g. described in [14].
The input signals are processed in the frequency domain using a Short-time Fourier transform (STFT), for example, with a frame length of 21 ms, 50% overlap and a Hann window function. Mimicking the frequency resolution and the temporal resolution of the human auditory system, sub-band signals are obtained by grouping the spectral coefficients. The transfer through the outer and middle ear is simulated with a fixed filter. Additionally, the transfer function of the reproduction system can be incorporated optionally, but is neglected here for simplicity.
FIG. 7 illustrates the transfer function modeling the path through the outer and middle ear.
The excitation function is computed for auditory filter bands spaced on the equivalent rectangular bandwidth (ERB) scale or the Bark scale.
FIG. 8 illustrates a simplified spacing of auditory filter bands as an example for a perceptually motivated spacing of the frequency bands.
In addition to the temporal integration due to the windowing of the STFT, a recursive integration can be used, with different time constants during attack and decay. The specific partial loudness, e.g., the partial loudness evoked in each of the auditory filter bands, is computed from the excitation levels from the signal of interest (the stimulus) and the interfering noise according to Equations (17)-(20) in [12]. These equations cover the four cases where the signal is above the hearing threshold in noise or not, and where the excitation of the mixture signal is less than 100 dB SPL or not. If no interfering signal is fed into the model, e.g. e[k]=0, the result equals the total loudness N[k] of the stimulus s[k] and should predict the information represented in the equal loudness contours (ELC), as shown in FIG. 9. There, FIG. 9 illustrates equal loudness contours, ISO226-2003, from [15].
Examples of outputs of the model are shown in FIGS. 10 and 11.
FIG. 10 illustrates specific partial loudness, exemplarily for frequency band 4, wherein the function of noise excitation ranges from 0 to 100 dB.
FIG. 11 illustrates specific partial loudness in noise with 40 dB noise excitation.
U.S. Pat. No. 7,050,966 (see [16]) describes a method for enhancing the intelligibility of speech in noise and mentions the combination of ANC and PNC, however, no teaching is given of how ANC and PNC can be advantageously combined.