1. Field of the Invention
The present invention relates generally to speech signal processing, and in particular, to a method and apparatus for detecting peaks from a speech signal, and detecting harmonic information, spectral envelope information, and voicing rate information (a degree of voicing) using the detected peaks.
2. Description of the Related Art
All systems using a speech signal use spectral estimation information when processing the speech signal in a frequency domain. However, since the entire spectrum of a speech signal cannot be coded or transmitted because of various reasons, spectral envelope information that is the general information of major harmonic elements in the spectrum is coded and transmitted, and the transmitted spectral envelope information is analyzed by a decoder and used. Thus, it is very important to extract harmonic information from a speech signal, and the extracted harmonic information significantly affects all speech systems. The spectral estimation information is very important information to process a speech signal, and in particular, sound quality of a synthesized speech signal in speech coding significantly depends on the performance of spectral coding in which a spectral envelope is estimated and encoded. Voiced and unvoiced information is also requisite and important information in speech signal analysis.
Linear prediction analysis methods are most widely used for harmonic component analysis and spectral estimation of a speech signal and have a characteristic of reducing the amount of computation by representing the properties of the speech signal with only parameters. Linear prediction analysis methods used for speech analysis, synthesis, and compression can represent a waveform and a spectrum of a speech signal using a small number of parameters and extract the parameters with only simple calculation. Linear prediction analysis methods are based on the principle that a current sample is assumed using a linear set of pre-samples in the past and thus a current value can be estimated from sample values in the past.
The performance of linear prediction analysis methods depends on an order of linear prediction. However, only with an increase of the order, the amount of computation increases, and an increase of the performance is limited. In particular, a disadvantage of linear prediction analysis methods is based on the assumption that a signal is stable for a predetermined short time. That is, since linear predictive coding is performed based on the assumption that a vocal tract transfer function can be modeled using a linear all-pole model, linear prediction analysis methods cannot follow a signal abruptly fluctuating in a transition area of a speech signal. In particular, linear prediction analysis methods have a tendency showing inferior performance to a woman or child speaker.
In addition, linear prediction analysis methods have a problem when data windowing is used. Selecting data windowing always results in an exchange relationship between resolution of a time axis and resolution on a frequency axis. For example, for very high pitch speech, linear prediction analysis methods (representatively, an autocorrelation method and a covariance method) have a problem of following individual harmonics rather than a spectral envelope because of a long distance between harmonics.