Automatic gain control (AGC) aims at equalizing level variations in recorded speech to ensure a constant level of the speech signal. Analysis of speech signals recorded by microphones reveals that the cause of level variations in recorded speech can be separated into two independent causes, namely intentional speech level variations and unintentional variations, due, for example, to the changes of distance between a speaker and a microphone.
Several AGC systems have been developed in order to equalize level variations. Currently developed AGC solutions are described in U.S. Pat. No. 8,121,835 and “Automatic Spatial Gain Control for an Informed Spatial Filter in Acoustics, Speech and Signal Processing (ICASSP)” Braun, S. et al., E. A. P. (2014), 2014 Institute of Electrical and Electronics Engineers (IEEE) International Conference on (pp. 830-834). They act, however, on both intentional and unintentional signal level fluctuations of the emitted speech energy.
Therefore in order to be able to convey a realistic sound field impression, e.g. for immersive teleconferencing systems, it is of paramount importance to detect the cause of level variations. Doing so would allow for fully equalizing of unintentional variations due to distance fluctuations while preserving intentional (natural) dynamic changes of the speech signals.
The idea of distinguishing between intentional and unintentional signal level variations, and equalizing only the unintentional ones, has recently been investigated from numerous perspectives. One of the solutions that have been proposed is to estimate the talker-microphone distances by acoustic source localization (ASL). Several ASL methods have been developed to equalize level variations arising from distance fluctuations between a source and a microphone for systems with synchronized microphones at known positions and without simultaneously active talkers. Such systems are described in, for example, U.S. Pat. No. 7,924,655 and “Energy-based sound source localization and gain normalization for ad hoc microphone arrays. Acoustics, Speech and Signal Processing”, Liu, Z. et al, ICASSP 2007, IEEE International Conference on. Vol. 2. IEEE, 2007.
However, conventional ASL methods exhibit at least one of the following deficiencies. In some conventional ASL methods the microphones have to be synchronized and/or their positions have to be known. Some conventional ASL methods cannot handle simultaneously active talkers. In some conventional ASL methods the large estimation errors do not allow to equalize close-talk level variations. Some conventional ASL methods are computationally complex.
Thus, there is a need for an improved sound signal processing apparatus and method allowing, in particular, for AGC.