1. Field of the Invention
This invention relates to a speech analysis method in which an input speech signal is divided in terms of blocks or frames as encoding units, the pitch corresponding to the fundamental period of the encoding-unit-based speech signals is detected and in which the speech signals are analyzed on the basis of the detected pitch from one encoding unit to another. The invention also relates to a speech encoding method and apparatus employing this speech analysis method.
2. Description of the Related Art
There have hitherto been known a variety of encoding methods for encoding an audio signal (inclusive of speech and acoustic signals) for signal compression by exploiting statistic properties of the signals in the time domain and in the frequency domain and psychoacoustic characteristics of the human being. The encoding method may roughly be classified into time-domain encoding, frequency domain encoding and analysis/synthesis encoding.
Examples of the high-efficiency encoding of speech signals include sinusoidal analytic encoding, such as harmonic encoding or multi-band excitation (MBE) encoding, sub-band coding (SBC), linear predictive coding (LPC), discrete cosine transform (DCT), modified DCT (MDCT) and fast Fourier transform (FFT).
In conventional encoding of harmonics for LPC residuals, MBE, STC or harmonics encoding, pitch search for a rough pitch is carried out in an open loop followed by a high-precision pitch search for a finer pitch. During this pitch search for a finer pitch, high-precision pitch search (search for fractional pitch with a sample value less than an integer) and amplitude evaluation of the waveform in the frequency range are carried out simultaneously. This high-precision pitch search is carried out for minimizing the distortion of the synthesized waveform of the frequency spectrum in its entirety, that is the synthesized spectrum, and the original spectrum, such as the spectrum of the LPC residuals.
However, in a frequency spectrum of the speech of a human being, a spectral component is not necessarily present at frequencies corresponding to integer number multiples of the fundamental wave. On the contrary, these spectral components may be delicately shifted along the frequency axis. In these cases, there are occasions wherein the amplitude evaluation of the frequency spectrum cannot be achieved correctly even if the high-precision pitch search is carried out using a sole fundamental frequency or pitch over the entire frequency spectrum of the speech signal.