This application claims the priority of Korean Patent Application No. 10-2002-0075650 filed on Nov. 30, 2002, in the Korean Intellectual Property Office, the disclosure of which is incorporated herein in its entirety by reference.
1. Field of Invention
The present invention relates to a voice region detection apparatus and method for detecting a voice region in an input voice signal, and more particularly, to a voice region detection apparatus and method capable of accurately detecting a voice region even in a voice signal with color noise.
2. Description of the Related Art
Voice region detection is used to detect only a pure voice region except a silent or noise region in an external input voice signal. A typical voice region detection method is a method of detecting a voice region by using energy of a voice signal and a zero crossing rate.
However, the aforementioned voice region detection method has a problem in that it is very difficult to distinguish voice and noise regions from each other since a voice signal with low energy such as in a voiceless sound region becomes buried in the surrounding noise in a case where the energy of the surrounding noise is large.
Further, in the above voice region detection method, the input level of a voice signal varies if a voice is input near a microphone or a volume level of the microphone is arbitrarily adjusted. To accurately detect a voice region under these circumstances, a threshold should be manually set on a case by case basis according to an input apparatus and usage environment. Thus, there is another problem in that it is very cumbersome to manually set a proper threshold.
To solve these problems in the voice region detection methods, Korean Patent Laying-Open No. 2002-0030693 entitled “Voice region determination method of a speech recognition system” discloses a method capable of detecting a voice region regardless of surrounding noise and an input apparatus by changing the threshold according to the input level of a voice upon detection of the voice region as shown in FIG. 1(a).
This voice region determination method can clearly distinguish voice and noise regions from each other in a case where surrounding noise is white noise as shown in FIG. 1(b). However, if the surrounding noise is color noise of which energy is high and whose shape varies with time as shown in FIG. 1(c), voice and noise regions may not be clearly distinguished from each other. Thus, there is a risk that the surrounding noise may be erroneously detected as a voice region.
Furthermore, since the voice region determination method requires repeated calculation and comparison processes, the amount of calculation is accordingly increased so that the method cannot be used in real time. Moreover, since the shape of the spectrum of a fricative is similar to that of noise, a fricative region cannot be accurately detected. Thus, there is a disadvantage in that the voice region determination method is not appropriate when more accurate detection of a voice region is required, such as in the case of speech recognition.