Embodiments of the inventive concept described herein relate to technology for nonlinear acoustic echo signal suppression by estimating a filter factor of a Volterra filter through a Multi-Tap Least Squares (MTLS) estimator and by estimating a prior near-end speech presence probability ratio (the ratio of the a priori probability of near-end speech presence and absence; Q) by a data-driven algorithm.
Nonlinear acoustic echo power signal estimation is generally obtained using cascade structures, power filters, or Volterra filters.
The cascade structure, as a mode of nonlinear acoustic echo signal estimation based on a raised-cosine function, operates to adaptively modify function factors to modify the raised-cosine function for nonlinearity of a system. The modified function factors are used to estimate the optimum power of nonlinear acoustic echo signal.
The power filter models a nonlinear acoustic echo signal in power series and adaptively modifies power series factors which properly represent a nonlinear acoustic echo signal from an output signal of a linear speaker. The modified power series factors are used to estimate the optimum power of nonlinear acoustic echo signal. The cascade structure and the power filter are known as inferior to the Volterra filter in performance.
The Volterra filer models a nonlinear acoustic echo signal in Volterra series. With the Volterra filter, Volterra series factors properly representing a nonlinear acoustic echo signal from an output signal of a nonlinear speaker is adaptively found to estimate the optimum power of nonlinear acoustic echo signal.
However, in the Volterra filter, as an adaptive algorithm such as Normalized Least Mean Square (NLMS) is used to update Volterra filter factors, it is difficult to offer fast adaptation to abrupt variations of environment and nonlinearity. For example, as the Volterra filter uses fixed constants, it is difficult to provide adaptation to circumferential environments of speaker and microphone until a speech signal output from the speaker is input into the microphone.
Therefore, it needs a solution quickly adaptable to abrupt variations of environments and nonlinearity.