1. Technical Field
The present disclosure relates to audio signal processing and more specifically to speech isolation.
2. Introduction
The quest to extract a desired speech signal from a mixture of signals including a number of directional interferer has led to a vast body of literature that has been growing rapidly over the last four decades.
Early signal extraction methods include algorithmically relatively simple fixed beamforming techniques such as delay-and-sum beamforming (DSB), filter-and-sum beamforming (FSB), and superdirective beamforming (SDB). These methods typically only achieve low to moderate signal extraction performance, whereby better performance is proportional to the number of microphones utilized, but additional microphones can add cost and may add an impractical amount of bulk and/or weight in mobile applications. In particular, these techniques tend to fail in moderately to highly reverberant acoustic environments.
Adaptive methods, such as the generalized sidelobe canceller (GSC), can improve spatial separation performance significantly, but introduce some drawbacks. Adaptive filtering can deal with changing parameters within the acoustic space, such as moving sources. However, because adaptation cannot happen instantaneously, adaptive filters must be carefully controlled to prevent instability. Thus, adaptive filtering can require tuning to be useful for a wide range of applications.
Another more recent adaptive beamforming method is based on blind source separation (BSS) techniques. Modern implementations can very effectively extract a desired source signal from a mixture of sources. However, typically, the same number of microphones as distinct sources are required for this technique to work well. Also, these systems are algorithmically fairly complex and are based on adaptive filtering techniques that may suffer from the same disadvantages mentioned in the context of the generalized sidelobe canceller.
Spatial noise suppression based on magnitude (SNS-M) is based on as few as two microphones, is fairly effective, and algorithmically very cheap. SNS-M compares magnitude measurements of an omnidirectional and dipole component that can be derived from two closely-spaced microphones. A disadvantage of this method is that the two microphones should be, ideally, perfectly calibrated for maximum performance.
TABLE 1FSB/DSBSDBGSCBSSSNS-MAlgorithmMediumLowHighHighLowcomplexityHardware costHighLowMediumLowLowEffectivenessLowMediumHighHighHighRobustnessHighVery lowLowMediumMediumVersatilityMediumMediumMediumLowHigh
Table 1 succinctly illustrates the strengths and weaknesses of each of these five prior art methods, and highlights favorable characteristics in bold. As can be seen, each of these approaches includes at least one weakness or are for potential improvement.