Current noise suppression techniques, such as Wiener filtering, attempt to improve the global signal-to-noise ratio (SNR) and attenuate low-SNR regions, thus introducing distortion into the speech signal. It is common practice to perform such filtering as a magnitude modification in a transform domain. Typically, the corrupted signal is used to reconstruct the signal with the modified magnitude. This approach may miss signal components dominated by noise, thereby resulting in undesirable and unnatural spectro-temporal modulations.
When the target signal is dominated by noise, a system that synthesizes a clean speech signal instead of enhancing the corrupted audio via modifications is advantageous for achieving high signal-to noise ratio improvement (SNRI) values and low signal distortion.