Acoustic signal processing is applicable today to improve the quality of sound signals such as from microphones. As one example, many devices such as handsets operate in the presence of sources of echoes, e.g., loudspeakers. Furthermore, signals from microphones may occur in a noisy environment, e.g., in a car or in the presence of other noise. Furthermore, there may be sounds from interfering locations, e.g., out-of-location conversation by others, or out-of-location interference, wind, etc. Acoustic signal processing is therefore an important area for invention.
Much of the prior art around the problem of acoustical noise reduction and echo suppression is concerned with the numerical estimation of parameters and statistically optimal suppression rules using such statistical criteria as minimum mean squared error (MMSE). Such approaches neglect the complexities of auditory perception, and thus assume that the MMSE criterion is well matched to the preference of a human listener.
Known processing methods and systems for dealing with noise, echo and spatial selectivity often concatenate different suppression systems based on different features. Each suppression systems is in some way optimized for its task or suppression function and acts directly on the signal passing through it before that signal is passed to the subsequent suppression system. Whilst this may reduce the design complexity, it creates results that leave much to be desired in terms of performance. For example, a spatial suppression system is likely to cause some level of modulation of the unwanted noise signal due to spatial uncertainties. If such a spatial suppression system is cascaded with a noise reduction system, the fluctuations in noise will increase uncertainty in the noise estimate and thus lower than performance. In such a simplistic concatenation, the spatial information is not available to the noise suppression, and thus some noise-like signals from the desired spatial location may be needlessly attenuated. Similar problems arise should the noise suppression occur first. This sort of problem is particularly prevalent with any two-input (two-channel) spatial suppression system. With only two sensors, as soon as there is more than one spatially discrete source present at a similar level, the estimation of spatial location becomes very noisy.
When the requirement for echo control is added, further problems arise. A dynamic suppression element prior to echo control can destabilize echo estimation. The alternative of having echo control first adds computational complexity. It is desirable to create a system that can retain a stable operation and avoid unnatural sounding output in the presence of voice, noise and echo, especially when the power in the desired signal is becomes low or comparable to the undesired signals.
In practice, a substantial amount of the performance, robustness and perceived quality of an audio processing system comes from heuristics, interrelated components and tuning.