1. Field of the Invention
The present invention relates to a method and a system for segmenting phonemes from voice signals, more particularly to a method and a system for segmenting phonemes from voice signals for a sound recognition system.
2. Description of the Related Art
Various techniques using voice signals have been applied to various systems including a voice recognition system. An important issue is how accurate the techniques detect a starting point and an ending point of voice signals when the voice signals are input, so as to input an accurate voice into a corresponding system.
In particular, a basic research for a method capable of dividing phonemes should be developed in order to recognize the voice signals. According to conventional methods, when a position of phonemes is identified so as to segment the phonemes, each segmented phoneme is classified and the content of the phoneme is identified. These methods are generally performed by complicated processes that typically include statistics and a combination of an extraction method for extracting various measurements.
These methods involve a large amount of calculation, and detect non-voice noise having a similar level to voice, as well as voice, so as to sensitize if noise is involved in the voice. In addition, since these methods involve a large amount of stochastic calculations according to a stochastic manner, the accuracy of calculation according to these methods may be reduced.
One of the most used methods among them is to combine and use cepstral-coefficients. However, this method implies an underlying limitation which a linear prediction method has.
Linear prediction methods, mainly used for voice signal analysis, are affected by an order of a linear prediction, where minimal improvements in calculation amounts and capability have been achieved. Such linear prediction methods can operate only under the assumption that there is no signal change for a short time and that vocal trace transfer function can be modeled by linear pole models.
Further, in linear prediction methods; a formant center frequency involves a large amount of calculation by a root calculation of a Linear Predictive Coding (LPC) polynomial, and weak peaks of a spectrum envelope occur when peaks are selected.
In addition, linear prediction methods apply a data windowing. When the data windowing is selected, it is difficult to detect the spectrum envelope if a balance between resolutions on a time axis and a frequency axis is not maintained. For example, in the case of voice having very high pitches, linear prediction methods follow individual harmonics because a distance between the harmonics is wide. Therefore, in the case of applying linear prediction methods to women or children, capability may be deteriorated.
In conventional methods, it is difficult to accurately detect the starting point and the ending point of phonemes. Further, conventional methods are inconvenient to use because of the great amount of calculation involved. Therefore, a need exists for a method for accurately defining the starting point and the ending point of phonemes and for simultaneously reducing the calculation.