1. Field of the Invention
The present invention relates to encoding and decoding of a signal indicative of speech or musical tones (hereinafter generically referred to as "speech signal"), which comprises compression encoding the speech signal by orthogonally transforming the speech signal represented in the time domain into a signal represented in the frequency domain and conducting vector quantization of the resulting orthogonal transform coefficients, and decoding the compressed encoded speech signal.
2. Prior Art
Conventionally, vector quantization is widely known as a method of compression encoding a speech signal which is capable of achieving high-quality compression encoding at a low bit rate. The vector quantization quantizes the waveform of a speech signal in units of given blocks into which the speech signal is divided. and therefore has the advantage that its required amount of information can be largely reduced. Thus, the vector quantization is widely used in the field of communication of speech information, and the like. A code book used in the vector quantization has vector codes thereof updated by learning according to generalized Lloyd's algorithm or the like using a lot of learned sample data. The thus updated code book, however, has its contents largely affected by characteristics of the learned sample data. To prevent the contents of the code book from having characteristics closer to particular characteristics, the learning must be carried out using a considerably large number of sample data. It is, however, impossible to provide such a large number of sample data for all of the possible patterns that are to be stored in the code book. Therefore, in actuality, the code book is prepared using data which are as random as possible.
On the other hand, in compression encoding a speech signal, it is employed to previously subject the speech signal to orthogonal transform (e.g. FFT, DCT, or MDCT) to achieve a higher compression efficiency in view of partiality of the power spectrum of the speech signal. When the orthogonal transform is conducted on a speech signal to be subjected to the vector quantization, it is desirable that orthogonal transform coefficients obtained by the orthogonal transform have amplitude thereof set to a fixed level before being subjected to vector quantization, because if the orthogonal transform coefficients have uneven values of amplitude, many code bits are required, and accordingly the number of code vectors corresponding thereto becomes very large. To this end, when the orthogonal transform coefficients are vector-quantized, the frequency spectrum (orthogonal transform coefficients) of the speech signal is smoothed by using one or more of the following methods (i) to (iv), into data suitable for vector quantization, and then learning of the code book is carried out using the data (e.g. Iwagami et al., "Audio Coding by Frequency Region-Weighted Interleaved Vector Quantization (TwinVQ)", The Acoustical Society of Japan, Lecture Collection, October, pp/339, 1994):
(i) the speech signal is subjected to linear predictive coding (LPC) to predict its spectral envelope, (ii) a moving average prediction method or the like is used to remove correlation between frames, (iii) pitch prediction is carried out, and (iv) redundancy dependent upon the frequency band is removed using psycho-physical characteristics of the listener's aural sense.
Information for smoothing the orthogonal transform coefficients according to one or more of the above methods is transmitted as auxiliary information together with a quantization index.
Most speech signals have stationary harmonic structures, and consequently the envelope of a train of transform coefficients obtained by orthogonally transforming a speech signal into a signal in the frequency domain has fine spiky irregularities. These irregularities cannot be fully expressed even by the use of LPC and the pitch prediction in combination. Therefore, the above-mentioned prior art smoothing techniques do not yet provide satisfactory results of smoothing of the frequency spectrum of a speech signal.
According to the vector quantization which requires that the orthogonal transform coefficients should have almost fixed amplitude, a conspicuous vector quantization error appears at portions which have not been smoothed. In the case of a speech signal having a relatively strong pitch or fundamental tone in particular, a vector quantization error occurs at a low frequency region, causing a degradation in the sound quality which is aurally perceivable. If an increased number of code bits are used to enhance the reproducibility of low frequency components, however, the number of code vectors corresponding thereto becomes very large, as stated above, causing an increase in the bit rate.