1. Field of the Invention
This invention relates to a method and device for extracting the pitch from an input speech signal.
2. Description of the Related Art
Speech is classified into voiced speech and unvoiced speech. The voiced speech is the speech accompanied by vibrations of vocal chords and is observed as periodic vibrations. The unvoiced speech is the speech not accompanied by vibrations of vocal chords and is observed as non-periodic noise. In usual speech, voiced speech accounts for the majority of the speech, while the unvoiced speech is made up only of special consonants termed unvoiced consonants. The period of the voiced speech is determined by the period of the vibrations of the vocal chords and is termed the pitch period, while its reciprocal is termed the pitch frequency. The pitch period and the pitch frequency represent main factors governing the pitch or the intonation of the speech. Therefore, extraction of the pitch period accurately from the original speech waveform (pitch extraction) is crucial throughout the process of analyzing and synthesizing the speech for speech synthesis.
As a method for the pitch extraction, there is known a correlation processing method exploiting the fact that the correlation processing acts against waveform phase distortion. An example of the correlation processing method is an autocorrelation method, according to which, in general, the input speech signal is limited to a pre-set frequency range and subsequently the autocorrelation of a pre-set number of samples of the input speech signal is found in order to extract the pitch and in order to obtain the pitch. For band-limiting the input speech signal, a low-pass filter (LPF) is generally employed.
If, in the above-mentioned autocorrelation method, the speech signal containing pulsed pitch in the low frequency components is used, the pulsed components are removed by passing the speech signal through an LPF. Thus it is difficult to extract the pitch of the speech signal passed through the LPF in order to obtain the correct pitch of the speech signal containing the pulsed pitch in the low-frequency components.
Further, another problem exists in which if the speech signal containing the pulsed pitch in the low-frequency components, in which the pulsed low-frequency components are not removed, is passed through only a high-pass filter (HPF), and if the speech signal waveform is a waveform containing a large quantity of noise, the pitch and noise components become hardly distinguishable from each other, such that the correct pitch again cannot be obtained.