1. Field of the Invention
This invention relates to a speech encoding method in which an input speech signal is divided in terms of blocks or frames as encoding units and encoded in terms of the encoding units, a decoding method for decoding the encoded signal, and a speech encoding/decoding method.
2. Description of the Related Art
There have conventionally been known a variety of encoding methods for encoding an audio signal (inclusive of speech and acoustic signals) for signal compression by exploiting statistic properties of the signals in the time domain and in the frequency domain and psychoacoustic characteristics of the human ear. The encoding methods may roughly be classified into time-domain encoding, frequency domain encoding and analysis/synthesis encoding.
Examples of the high-efficiency encoding of speech signals include sinusoidal analytic encoding, such as harmonic encoding or multi-band excitation (MBE) encoding, sub-band coding (SBC), linear predictive coding (LPC), discrete cosine transform (DCT), modified DCT (MDCT), and fast Fourier transform (FFT).
In the conventional MBE encoding or harmonic encoding, unvoiced speech portions are generated by a noise generating circuit. However, this method has a drawback that explosive consonants, such as p, k or t, or fricative consonants, cannot be produced correctly.
Moreover, if encoded parameters having totally different properties, such as line spectrum pairs (LSPs), are interpolated at a transient portion between a voiced (V) portion and an unvoiced (UV) portion, extraneous or foreign sounds tend to be produced. It being understood that by voiced is meant those sounds that have a discernable spectral distribution and by unvoiced is meant those sounds whose spectrum looks like noise.
In addition, with the conventional sinusoidal synthetic coding, low-pitch speech, particularly, male speech, tends to become unnatural “stuffed” speech.