1. Field of the Invention
The present invention relates to voice activity detection, and more particularly to an apparatus and method for detecting a speech signal period from an input signal by using spectral subtraction and a probability distribution model.
2. Description of Related Art
With the development of technology, various devices have been developed that can more conveniently maintain peoples' lifestyles. In particular, devices have been provided that can recognize speech and properly react to it. This capability is known as speech recognition.
The principal technologies of such speech recognition include a technology that detects a period where a speech signal is present in an input signal, and a technology that captures the content included in the detected speech signal.
Voice detection technology is required in speech recognition and speech compression. The core of this technology is to distinguish the speech and noise of an input signal.
A representative example of this technology includes the “Extended Advanced Front-end Feature Extraction Algorithm” (hereinafter, referred to as “first conventional art”) which was selected by the European Telecommunication Standard Institute (ETSI) in November of 2003. According to this algorithm, a voice activity period is detected based on energy information in a speech frequency band by using a temporal change of a feature parameter with respect to a speech signal in which a noise is removed. However, when the noise level is high, performance may be deteriorated.
Also, Korean Patent No. 10-304666 (hereinafter, referred to as “second conventional art”) discloses a method for detecting a voice activity period by estimating in real-time each component of a noise signal and a speech signal from a speech signal having noise using statistical modeling such as the complex Gaussian distribution. However, even in this case, when the magnitude of a noise signal becomes greater than the magnitude of a speech signal, a voice activity period may not be detected.
According to the above-described conventional art, a signal-to-noise ratio (hereinafter, referred to as “SNR”) decreases, that is, the magnitude of noise increases, and thus it may not be easy to distinguish a speech period from a noise period, as shown in FIGS. 1A to 1D.
FIGS. 1A to 1D are histograms illustrating a distribution of a speech signal 110 having noise and a noise signal 120 according to a change in an SNR. Referring to FIGS. 1A to 1D, an x-X-axis represents the magnitude of band energy in a frequency band between 1 kHz and 1.03 kHz, and a y-axis represents a probability with respect thereto.
Also, FIG. 1A illustrates a histogram when an SNR is 20 dB, FIG. 1B illustrates a histogram when an SNR is 10 dB, FIG. 1C illustrates a histogram when an SNR is 5 dB, and FIG. 1D illustrates a histogram when an SNR is 0 dB.
Referring to FIGS. 1A to 1D, as the SNR value decreases, the speech signal 110 having noise is more concealed by the noise signal 120. Accordingly, the speech signal 110 having noise may not be distinguished from the noise signal 120.
Specifically, according to the conventional methods, a speech period and a noise period may not be easily distinguished from each other in an input signal having a low SNR value.