(1) Field of the Invention
This invention relates to methods for speech coding and decoding and apparatuses for speech coding and decoding for performing compression coding and decoding of a speech signal to a digital signal. Particularly, this invention relates to a method for speech coding, method for speech decoding, apparatus for speech coding, and apparatus for speech decoding for reproducing a high quality speech at low bit rates.
(2) Description of Related Art
In the related art, code-excited linear prediction (Code-Excited Linear Prediction: CELP) coding is well-known as an efficient speech coding method, and its technique is described in “Code-excited linear prediction (CELP): High-quality speech at very low bit rates,” ICASSP '85, pp. 937-940, by M. R. Shroeder and B. S. Atal in 1985
FIG. 6 illustrates an example of a whole configuration of a CELP speech coding and decoding method. In FIG. 6, an encoder 101, decoder 102, multiplexing means 103, and dividing means 104 are illustrated.
The encoder 101 includes a linear prediction parameter analyzing means 105, linear prediction parameter coding means 106, synthesis filter 107, adaptive codebook 108, excitation codebook 109, gain coding means 110, distance calculating means 111, and weighting-adding means 138. The decoder 102 includes a linear prediction parameter decoding means 112, synthesis filter 113, adaptive codebook 114, excitation codebook 115, gain decoding means 116, and weighting-adding means 139.
In CELP speech coding, a speech in a frame of about 5-50 ms is divided into spectrum information and excitation information, and coded.
Explanations are made on operations in the CELP speech coding method. In the encoder 101, the linear prediction parameter analyzing means 105 analyzes an input speech S101, and extracts a linear prediction parameter, which is spectrum information of the speech. The linear prediction parameter coding means 106 codes the linear prediction parameter, and sets a coded linear prediction parameter as a coefficient for the synthesis filter 107.
Explanations are made on coding of excitation information.
An old excitation signal is stored in the adaptive codebook 108. The adaptive codebook 108 outputs a time series vector, corresponding to an adaptive code inputted by the distance calculator 111, which is generated by repeating the old excitation signal periodically.
A plurality of time series vectors trained by reducing distortion between speech for training and its coded speech, for example, is stored in the excitation codebook 109. The excitation codebook 109 outputs a time series vector corresponding to an excitation code inputted by the distance calculator 111.
Each of the time series vectors outputted from the adaptive codebook 108 and excitation codebook 109 is weighted by using a respective gain provided by the gain coding means 110 and added by the weighting-adding means 138. Then, an addition result is provided to the synthesis filter 107 as excitation signals, and coded speech is produced. The distance calculating means 111 calculates a distance between the coded speech and the input speech S101, and searches an adaptive code, excitation code, and gains for minimizing the distance. When the above-stated coding is over, a linear prediction parameter code and the adaptive code, excitation code, and gain codes for minimizing a distortion between the input speech and the coded speech are outputted as a coding result.
Explanations are made on operations in the CELP speech decoding method.
In the decoder 102, the linear prediction parameter decoding means 112 decodes the linear prediction parameter code to the linear prediction parameter, and sets the linear prediction parameter as a coefficient for the synthesis filter 113. The adaptive codebook 114 outputs a time series vector corresponding to an adaptive code, which is generated by repeating an old excitation signal periodically. The excitation codebook 115 outputs a time series vector corresponding to an excitation code. The time series vectors are weighted by using respective gains, which are decoded from the gain codes by the gain decoding means 116, and added by the weighting-adding means 139. An addition result is provided to the synthesis filter 113 as an excitation signal, and an output speech S103 is produced.
Among the CELP speech coding and decoding method, an improved speech coding and decoding method for reproducing a high quality speech according to the related art is described in “Phonetically—based vector excitation coding of speech at 3.6 kbps,” ICASSP '89, pp. 49-52, by S. Wang and A. Gersho in 1989.
FIG. 7 shows an example of a whole configuration of the speech coding and decoding method according to the related art, and same signs are used for means corresponding to the means in FIG. 6.
In FIG. 7, the encoder 101 includes a speech state deciding means 117, excitation codebook switching means 118, first excitation codebook 119, and second excitation codebook 120. The decoder 102 includes an excitation codebook switching means 121, first excitation codebook 122, and second excitation codebook 123.
Explanations are made on operations in the coding and decoding method in this configuration. In the encoder 101, the speech state deciding means 117 analyzes the input speech S101, and decides a state of the speech is which one of two states, e.g., voiced or unvoiced. The excitation codebook switching means 118 switches the excitation codebooks to be used in coding based on a speech state deciding result. For example, if the speech is voiced, the first excitation codebook 119 is used, and if the speech is unvoiced, the second excitation codebook 120 is used. Then, the excitation codebook switching means 118 codes which excitation codebook is used in coding.
In the decoder 102, the excitation codebook switching means 121 switches the first excitation codebook 122 and the second excitation codebook 123 based on a code showing which excitation codebook was used in the encoder 101, so that the excitation codebook, which was used in the encoder 101, is used in the decoder 102. According to this configuration, excitation codebooks suitable for coding in various speech states are provided, and the excitation codebooks are switched based on a state of an input speech. Hence, a high quality speech can be reproduced.
A speech coding and decoding method of switching a plurality of excitation codebooks without increasing a transmission bit number according to the related art is disclosed in Japanese Unexamined Published Patent Application 8-185198. The plurality of excitation codebooks is switched based on a pitch frequency selected in an adaptive codebook, and an excitation codebook suitable for characteristics of an input speech can be used without increasing transmission data.
As stated, in the speech coding and decoding method illustrated in FIG. 6 according to the related art, a single excitation codebook is used to produce a synthetic speech. Non-noise time series vectors with many pulses should be stored in the excitation codebook to produce a high quality coded speech even at low bit rates. Therefore, when a noise speech, e.g., background noise, fricative consonant, etc., is coded and synthesized, there is a problem that a coded speech produces an unnatural sound, e.g., “Jiri-Jiri” and “Chini-Chin.” This problem can be solved, if the excitation codebook includes only noise time series vectors. However, in that case, a quality of the coded speech degrades as a whole.
In the improved speech coding and decoding method illustrated in FIG. 7 according to the related art, the plurality of excitation codebooks is switched based on the state of the input speech for producing a coded speech. Therefore, it is possible to use an excitation codebook including noise time series vectors in an unvoiced noise period of the input speech and an excitation codebook including non-noise time series vectors in a voiced period other than the unvoiced noise period, for example. Hence, even if a noise speech is coded and synthesized, an unnatural sound, e.g., “Jiri-Jiri,” is not produced. However, since the excitation codebook used in coding is also used in decoding, it becomes necessary to code and transmit data which excitation codebook was used. It becomes an obstacle for lowing bit rates.
According to the speech coding and decoding method of switching the plurality of excitation codebooks without increasing a transmission bit number according to the related art, the excitation codebooks are switched based on a pitch period selected in the adaptive codebook. However, the pitch period selected in the adaptive codebook differs from an actual pitch period of a speech, and it is impossible to decide if a state of an input speech is noise or non-noise only from a value of the pitch period. Therefore, the problem that the coded speech in the noise period of the speech is unnatural cannot be solved.
This invention was intended to solve the above-stated problems. Particularly, this invention aims at providing speech coding and decoding methods and apparatuses for reproducing a high quality speech even at low bit rates.