1 Field of the Invention
The present invention relates to speech enhancement, and more particularly, to a method for enhancing a speech spectrum by estimating a noise spectrum in speech presence intervals based on speech absence probability, as well as in speech absence intervals.
2. Description of the Related Art
A conventional approach to speech enhancement is to estimate a noise spectrum in noise intervals where speech signals are not present, and in turn to improve a speech spectrum in a predetermined speech interval based on the noise spectrum estimate. A voice activity detector (VAD) has been utilized for an algorithm required for speech presence/absence interval classification with respect to a predetermined input signal. However, the VAD operates in a different manner from a speech enhancement technique, and thus noise interval detection and noise spectrum estimation based on detected noise intervals have no relationship with models and assumptions for use in practical speech enhancement, which degrades the performance of the speech enhancement technique. In addition, in the case of using the VAD, the noise spectrum is estimated only in speech absence intervals. However, since the noise spectrum actually varies in speech presence intervals as well as the speech absence intervals, the accuracy of noise spectrum estimation using the VAD is limited.
To solve the above problems, it is an object of the present invention to provide a method for enhancing a speech spectrum in which a signal-to-noise ratio (SNR) and a gain of each frame of an input speech signal is updated based on a speech absence probability, without using a separate voice activity detector (VAD).
The above object is achieved by the method according to the present invention for enhancing the speech quality, comprising: (a) segmenting an input speech signal into a plurality of frames and transforming each frame signal into a signal of the frequency domain; (b) computing the signal-to-noise ratio of a current frame, and computing signal-to-noise ratio of a frame immediately preceding the current frame; (c) computing the predicted signal-to-noise ratio of the current frame which is predicted based on the preceding frame and computing the speech absence probability using the signal-to-noise ratio and predicted signal-to-noise ratio of the current frame, (d) correcting the two signal-to-noise ratios obtained in the step (b) based on the speech absence probability computed in the step (c); (e) computing the gain of the current frame with the two corrected signal-to-noise ratios obtained in the step (d), and multiplying the speech spectrum of the current frame by the computed gain; (f) estimating the noise and speech power for the next frame to calculate the predicted signal-to-noise ratio for the next frame, and providing the predicted signal-to-noise ratio for the next frame as the predicted signal-to-noise ratio of the current frame for the step (c); and (g) transforming the result spectrum of the step (e) into a signal of the time domain.