The present invention relates generally to noise reduction in perceived audio signals. A well understood problem in the field of audio playback systems is the time variation of noise level and spectral characteristics. When listening to an audio signal in a noisy environment such as a busy public place, outdoors on a windy day, or in a moving vehicle, the noise level can change frequently, for example with the passing of traffic or groups of people in conversation. It is inconvenient for the user to have to manually change the volume of the audio playback as these changes occur to achieve acceptable levels of audibility and intelligibility.
One method of addressing this problem is to measure the noise level with a microphone and automatically increase the volume when the noise level increases and decrease the volume when the noise level decreases.
However, noise is rarely perfectly described by a white noise model, spread uniformly across the frequency spectrum. In a moving car the ambient noise is largely at low frequencies so a uniform volume increase will make the audio seem higher pitched than it should as the noise masks the low frequency components of the audio signal. The spectrum of the noise can, like the noise level, change frequently; again using the example of a motor vehicle many variables are involved including speed, road surface and passing traffic.
Therefore it is preferable to continuously monitor both the noise level and its frequency characteristics and apply dynamic frequency-specific gains to the audio signal with the aim of ensuring it is audible and intelligible over the noise. The output of such a dynamic audibility enhancement system should be a version of the primary audio input signal, processed in such a way as to improve the listening experience for a typical listener in a given noise environment.
FIG. 1 depicts a dynamic frequency-specific audibility enhancement system at one moment in time. The user 1 is trying to listen to primary audio signal input x(n) from audio source 2. This could for example be a telephone conversation using a hands-free kit or a car radio playing music. However the audio is partially masked by noise from noise source 3. The system employs microphone 4 to measure the sound pressure levels near the user's head. The signal measured by microphone 4, d(n), is input to signal processor 5. Signal processor 5 calculates frequency-specific gain profile G(n). Primary audio signal input x(n) is multiplied with frequency-specific gain G(n) to produce a noise compensated signal. This noise compensated signal is then played through loudspeaker 6.
In an ideal system, the frequency-specific gain could be
                              G          ⁡                      (            n            )                          =                                                        d              ⁡                              (                n                )                                                                                                x              ⁡                              (                n                )                                                                                    (        1        )            
However the sound the user hears depends on the variation in sound pressure levels at the listener's ear, not the signals inside the signal processor; these are not equivalent in a real world system. Therefore G(n) should be compensated by an equalisation factor. The value of the equalisation factor may depend on many variables. These could include analogue gains within the system, the loudspeaker and microphone frequency responses and the distances between the users ear, microphone, loudspeaker and noise source. This equalisation factor may be determined by calibration of each individual system, as is the case in, for example, Sergey. Kib; Budkin, Alexey; Goldin, Alexander A. “Automatic Volume and Equalization Control in Mobile Devices”, Proc. of 121 AES Convention, 2006. However calibration procedures are cumbersome, time and power consuming, must be updated frequently to remain accurate due to changes in the relevant distances and are not always feasible in practice.
In U.S. Pat. No. 6,529,605 the calibration problem is avoided. The signal picked up by the microphone is split into a desired signal and a noise signal by an adaptive filter. The desired signal is extracted and utilised to form a control signal which is subsequently used to control the loudspeaker signal. However, the problem remains that this system does not consider that the user may be speaking: an important consideration especially for implementations in hands-free kits and mobile telephones. Therefore the loudspeaker signal will be amplified whenever the user speaks, drowning them out. This effect will be intensely irritating to the user and make it very difficult for them to continue a conversation with the device switched on. In implementations such as headphones for listening to music from a personal audio device or car radio this will reduce user enjoyment and in telephone related applications this will defeat the object of the device entirely.
Another problem with audio playback systems, in particular in confined spaces such as vehicles, is the interference of the currently playing sound from the loudspeaker with echoes of the recently played sound from the loudspeaker. To cancel the echo signal an adaptive filter can be used which identifies the acoustic echo path so that future echoes may be calculated and subtracted from the loudspeaker signal. However when user speech is present at the same time as a loudspeaker signal the adaptive filter can diverge. Thus a double talk detector can be used to slow down or halt adaptation of the filter in the presence of user speech.
Finally, most dynamic audibility enhancement systems simply raise the magnitude of the loudspeaker signal such that the magnitude of the signal reaching the user's ear is above that of the noise signal. This does not fully take into account auditory masking effects such as those of tone-like noise signals, e.g. the distinct narrow frequency peaks, or formants, commonly found in speech and music. In quiet conditions the absolute threshold of hearing for a normal human ear lays along curve A, shown in FIG. 2. Thus in quiet conditions signal D would be audible. However, when tone C is present the threshold of hearing at frequencies surrounding the tone is altered, gaining a “hump” around the frequency of the tone as shown by curve B. This masks signals not only at the frequency of the tone but also at nearby frequencies. In this case signal D becomes inaudible in the presence of tone C. In order to make D audible, it is necessary to raise the level of D above the level of the altered threshold of hearing B evaluated at the frequency of signal D. Note that, as shown in FIG. 2, it is possible for the maximum in the altered threshold of hearing B to be at a lower sound pressure level than the level of tone C, thus it is not always necessary for audibility of the play-out signal to raise the level of the loudspeaker signal such that the level of the echo signal is higher than the level of the noise.
In M. Tzur (Zibulski) and A. A. Goldin, “Sound equalization in a noisy environment”, Proc. Of 110 AES Convention, 2001, the auditory masking threshold profile of the loudspeaker signal is estimated and the final gain profile is determined empirically based on this threshold profile such that the loudspeaker signal always masks the noise. However total noise masking is not always desirable. For example when listening to music in a car: while it is necessary that the music is not masked completely by the noise in order to enjoy the music, it is unsafe to have all traffic noise masked by the music, the driver should be able to hear and react to noises such as the sound of a motorbike overtaking or an approaching emergency service vehicle siren.
Another psychoacoustic effect that basic systems fail to take into account is the human ear's varying sensitivity to different frequencies. FIG. 3 shows equal loudness contours as perceived by a normal human, demonstrating that the ear becomes relatively more sensitive to low frequencies at high intensities. Therefore tonal balance should be considered.
What is needed is a dynamic frequency dependent audibility enhancement system with no calibration or divergence of adaptive filter algorithms due to user speech, which takes into account psychoacoustic effects so that a user is able to hear an audio signal as intended without all environmental noise being totally drowned out.