1. Field of the Invention
The present invention generally relates to a method and system for recognizing a phoneme in a speech signal, and in particular, to a method of recognizing a phoneme in a speech signal, which is used in a speech recognition system, and a system using the method.
2. Description of the Related Art
Various techniques using a speech signal are applied to various systems including a speech recognition system, where it is important to know whether correct speech can be input to a relevant system by correctly detecting a start-point and an end-point of a speech signal when the speech signal is input.
In particular, a method of distinguishing a phoneme should be studied to recognize a speech signal. According to the prior art, when phonemes are segmented by obtaining locations thereof, methods of distinguishing the segmented phonemes and obtaining the contents of the segmented phonemes are achieved through very complex processes, most of which are constituted by combining a statistical method and a plurality of measure extraction methods.
One of the most frequently used methods combines a plurality of cepstral or perceptual linear predictive coding (LPC) coefficients. However, this method also has an intrinsic limit that a linear prediction method has.
Performance of the linear prediction method frequently used in speech signal analysis depends on an order of linear prediction. However, if the order of linear prediction is increased to increase the performance, an amount of computation increases, and furthermore, the performance is not increased more than a certain level. The linear prediction method is available only in a short-time stationary assumption in which a signal does not vary for a short time and an assumption that a vocal tract transfer function can be modeled using a linear all pole model.
In addition, in the linear prediction method, a formant center frequency has a high amount of computation due to LPC polynomial root calculation, and a peak may not be robust in peak picking of a spectral envelope.
In addition, the linear prediction method uses data windowing. However, if a resolution balance between a time axis and a frequency axis is not maintained when the data windowing is selected, it is difficult to detect a spectral envelope. For example, in a case of speech having a very high pitch, individual harmonics are followed in the linear prediction method due to wide gaps of the harmonics. Thus, in a case of a female or child, performance of the linear prediction method decreases.
As described above, it is inconvenient due to a high amount of computation to distinguish and recognize a phoneme using a conventional method. Thus, a method of recognizing a phoneme more correctly while reducing an amount of computation is desired.