The present invention relates to a speech-encoding method. It can be applied especially to the making of vocoders working at very low bit rates, in the range of about 1,200 bits per second and implemented for example in satellite communications. Internet telephony static responders, voice pagers etc.
The purpose of these vocoders is to rebuild a signal that is as close as possible, in the sense of perception by the human ear, to the original speech signal, in using the lowest possible binary rate.
To achieve this goal, vocoders use a completely parameterized model of the speech signal. The parameters used pertain to voicing which describes the periodic character of the voiced sounds or the randomness of unvoiced sounds, the fundamental frequency of the voiced sounds, also known as xe2x80x9cpitchxe2x80x9d, the temporal evolution of the energy as well as the spectral envelope of the signal to excite and parameterize the synthesis filters. The filtering is generally performed by a technique of linear predictive digital filtering.
These various parameters are estimated periodically on the speech signal, from one to several times per 10-ms to 30-ms frame, depending on the parameters and the coders. They are prepared in an analysis device and are Generally transmitted remotely to a synthesis device.
The field of low-bit-rate speech-encoding has long been dominated by a 2400 bits/s encoder known as the LPC 10. A description of this encoder, as well as of an alternative working at a lower bit rate can be found in the following articles:
xe2x80x9cParameters and coding characteristics that must be common to assure interoperability of 2400 bps linear predictive encoded speechxe2x80x9d, NATO Standard STANAG-4198-Ed 1, Feb. 13 1984 and in the article by B. Mouy, D de la Noue et G. Goudezeune, xe2x80x9cNATO STANAG 4479: A Standard for an 800 bps Vocoder and Channel Coding in HF-ECCM systemxe2x80x9d, in IEEE International Conference on Acoustics, Speech, and Signal Processing, Detroit, May 1955, pp. 480-483.
While the speech reproduced by this vocoder is perfectly intelligible, it is of rather poor quality, so that its use is limited to quite specific applications mainly professional and military applications. In recent years the field of low-bit-rate speech encoding has seen very many innovations through the introduction of new models known respectively under the abbreviations. MBE, PWI and MELP.
A description of the MBE model can be found in the article by D. W. Griffin and J. S. Lim. xe2x80x9cMultiband Vocoders Excitationxe2x80x9d in IEEE Transactions On Acoustics, Speech, and Signal Processing, vol. 36, No. 8, pp. 1223-1235, 1988.
A description of the PWI model can be found in the article by W. B. Kleijn and J Haogen, xe2x80x9cWaveform Interpolation for Coding and Synthesisxe2x80x9d, in W. B. Kleijn and K. K. Paliwal ed. Speech Coding and Synthesis, Elsevier 1995.
Finally, a description of the MELP model can be found in the article by L. M. Supplee, R. P. Cohn, J. S. Collura, and A. V. McCree, xe2x80x9cMELP: The New Federal Standard At 2400 bits/sxe2x80x9d, in IEEE International Conference on Acoustics, Speech, and Signal Processing, Munich, April 1997, pp. 1591-1594.
The quality of the speech restored by these 2400 bits/s models has become acceptable for a large number of civilian and commercial applications. However, for bit rates below 2400 bits/s (typically 1200 bits/s or less) the restored speech is of inadequate quality and, to mitigate this drawback, other techniques have been used. A first technique is that of the segmental vocoder, two variants of which are described by. B. Mouy, P. de la Noue and G. Goudezeune already referred to, and by Y. Shoham, xe2x80x9cVery Low Complexity Interpolative Speech Coding At 1.2 To 2.4 K bpsxe2x80x9d, in IEEE International Conference on Acoustics, Speech, and Signal Processing, Munich. April 1997, pp 1599-1602.
To date, however, no segmental vocoder has been deemed to be of a quality sufficient for civilian and commercial applications.
A second technique is that implemented in phonetic vocoders, which combine principles of recognition and synthesis. The activity in this field is rather at the fundamental research stage. The bit rates involved are generally far lower than 1,200 bits/s (typically 50 to 200 bits/s) but the quality obtained is rather poor and there is often no recognition of the speaker. A description of these types of vocoders can be found in the article by J Cernocky, G Baudoin, G Chollet,: xe2x80x9cSegmental Vocoder-Going Beyond The Phonetic Approachxe2x80x9d in International IEE Conference on Acoustics, Speech, and Signal Processing, Seattle, May 12-15 1998, pp. 605-698.
The goal of the invention is to mitigate the above-mentioned drawbacks.
To this end, an object of the invention is a method of encoding and decoding speech for voice communications using a vocoder with a very low bit rate comprising an analysis part for the encoding and transmission of the parameters of the speech signal and a synthesis part for the reception and decoding of the parameters transmitted, and the rebuilding of the speech signal through the use of linear predictive synthesis filters of the type consisting in analyzing the parameters, describing the pitch, the voicing transition frequency, the energy, and the spectral envelope of the speech signal, by subdividing the speech signal into successive frames of given length characterized in that it consists in assembling the parameters on N consecutive frames to form a super-frame, making a vector quantization of the transition frequencies of the voicing during each super-frame, transmitting without deterioration only the most frequent configurations and replacing the least frequent configurations by the configuration that is the nearest in terms of absolute error among the most frequent configurations, encoding the pitch in carrying out a scalar quantization of only one value for each super-frame, encoding the energy in selecting only a reduced number of values in assembling these values in sub-packets quantized by vector quantization, the non-transmitted energy values being recovered in the synthesis part by interpolation or extrapolation from transmitted values, encoding, by vector quantization, the spectral envelope parameters for the encoding of the linear prediction synthesis filters by selecting only a specified number of filters, the untransmitted parameters being rebuilt by interpolation or extrapolation from the parameters of the transmitted filters.