The present invention relates to a low rate speech coding/decoding method used for digital telephones, voice memories, and the like.
Recently, as a coding technology used for portable telephones, the internet, and the like to compress speech information and audio information to small information amounts and transmit or store them, the CELP (Code Excited Linear Prediction (M. R. Schroeder and B. S. Atal, xe2x80x9cCode Excited Linear Prediction (CELP): High Quality Speech at Very Low Bit Rates,xe2x80x9d Proc. ICASSP, pp. 937-940, 1985 (reference 1)) scheme has been often used.
The CELP scheme is a coding scheme based on linear predictive analysis, in which an input speech signal is separated into linear predictive coefficients representing phoneme information and a prediction residual signal representing characteristics such as pitch period of a speech by linear predictive analysis. A digital filter, called a synthesis filter, is formed on the basis of the linear predictive coefficients. The original input speech signal can be reconstructed by inputting the prediction residual signal as an excitation signal to the synthesis filter. For low-bit-rate speech coding, these linear predictive coefficients and the prediction residual signal must be coded with a small number of bits.
In the CELP scheme, a signal obtained by coding a prediction residual signal is generated as an excitation signal by adding the products of two types of vectors, i.e., a pitch vector and a stochastic vector, and gains.
A stochastic vector is generally generated by searching for an optimal candidate from a codebook in which many candidates are stored. This search uses a method of generating synthesized speech signals by filtering all the stochastic vectors through the synthesis filter together with pitch vectors, and selecting a stochastic vector with which a synthesized speech signal, such that an error between the synthesized speech signal and the input speech signal is minimum, is generated. It is therefore an important point for the CELP scheme to efficiently store stochastic vectors in the codebook.
As a scheme for satisfying such a requirement, pulse excitation, expressing a stochastic vector by a train of several pulses, is known. An example of this scheme is the multi-pulse scheme disclosed in reference 2 (K. Ozawa and T. Araseki, xe2x80x9cLow Bit Rate Multi-pulse Speech Coder with Natural Speech Quality,xe2x80x9d IEEE Proc. ICASSP ""86, pp. 457-460, 1986).
An Algebraic codebook (J-P. Adoul et al, xe2x80x9cFast CELP coding based on algebraic codesxe2x80x9d, Proc. ICASSP ""87, pp. 1957-1960 (reference 3) is another example and has a simple structure in which a stochastic vector is expressed by only the presence/absence of a pulse and polarity (+, xe2x88x92). In spite of the limitation that the amplitude of a pulse is 1, unlike a multi-pulse, this technique is widely used for low rate coding because speech quality does not deteriorate much and a fast search method is proposed. As a scheme using an algebraic codebook, an improved scheme of allowing a pulse to have an amplitude has been proposed as disclosed in reference 4 (Chang Deyuan, xe2x80x9cAn 8 kb/s low complexity CELP speech codec,xe2x80x9d 1996 3rd International Conference on Signal Processing, pp. 671-4, 1996).
In each type of pulse excitation described above, pulse position candidates at which pulses are set are limited to integer sampling positions, i.e., sampling points of a stochastic vector. For this reason, even if an attempt is made to improve the performance of a stochastic vector by increasing the number of bits assigned to pulse position candidates, bits cannot be assigned beyond the number of bits required to express the number of samples contained in a frame.
Even in a case wherein adapting of pulse position candidates which is provided by U.S. patent application Ser. No. 09/220,062 is to be performed, if the number of bits expressing position information is large, pulse position candidates are set for most samples even at a section where pulse position candidates should be dispersed. As a consequence, this section is difficult to discriminate from a section on which pulse position candidates are concentrated, resulting in a poor adapting effect.
It is an object of the present invention to provide a speech coding/decoding method that can assign an arbitrary number of bits to pulse position information, regardless of the number of samples in a frame, which is a length of an excitation signal generated based on the pulse position, and can improve sound quality.
It is an object of the present invention to provide a speech coding/decoding method that can resolve a saturation phenomenon occurring when a pulse position is fixed at an integer position using a method of adapting a pulse position candidate, which is provided by U.S. patent application Ser. No. 09/220,062, the contents of which are incorporated herein by reference. The method can improve speech quality by making effective use of adapting the pulse position candidate.
According to the invention, there is provided a speech coding method which comprises: analyzing an input speech signal to divide the input speech signal into a parameter representing a frequency characteristic of a speech and an excitation signal which is an input signal of a synthesis filter generated based on the parameter, to output a first index specifying the parameter representing the frequency characteristic as a coded result, the excitation signal being formed of a pulse train including a pulse selected from first pulses and second pulses, the first pulses being set at first positions located on sampling points of the excitation signal and the second pulses being set at second positions located between sampling points of the excitation signal; generating a synthesized speech signal based on the coded result and the excitation signal; generating a second index indicating a parameter with which an error between the input speech signal and the synthesized speech signal is minimized; selecting a pulse position candidate from a pulse position codebook in accordance with the second index; and outputting the first and second indexes.
According to the invention, there is provided a speech decoding method which comprises: extracting, from a coded stream, a first index indicating a frequency characteristic of a speech, a second index indicating a pitch vector, and a third index indicating a pulse train of an excitation signal; reconstructing a synthesis filter by decoding the first index; reconstructing the pitch vector on the basis of the second index; reconstructing on the basis of the third index the excitation signal formed by using a pulse train including a pulse selected from first pulses and second pulses, the first pulses being set on sampling points of the excitation signal, and the second pulses being set at positions located between sampling points of the excitation signal, and generating a decoded speech signal by exciting a synthesis filter by means of the reconstructed excitation signal and pitch vector.
In other words, the present invention provides a speech coding/decoding method in which an excitation signal is formed by using a pulse train, and the pulse train contains a pulse selected from first pulses set on sampling points of the excitation signal and second pulses set at positions located between sampling points of the excitation signal.
According to the invention, there is provided a speech coding method which comprises: analyzing an input speech signal to divide the input speech signal into a parameter representing a frequency characteristic of a speech and an excitation signal formed based on the parameter and input to a digital filter, to output a first index specifying the parameter representing the frequency characteristic as a coded result, the excitation signal being generated by using a pitch vector and a stochastic vector for exciting a synthesis filter; generating the stochastic vector by using a pulse train including a pulse selected from first pulses and second pulses, the first pulses being set on sampling points of the stochastic vector and the second pulses being set at set positions located between sampling points of the stochastic vector; generating a synthesized speech signal based on the coded result and the excitation signal; and generating a second index with which an error between the input speech signal and the synthesized speech signal is minimized.
According to the invention, there is provided a speech decoding method which comprises: extracting, from a coded stream, a first index indicating a frequency characteristic of a speech, a second index indicating a pitch vector, and a third index indicating a pulse train of an excitation signal; reconstructing a synthesis filter by decoding the first index; reconstructing the pitch vector on the basis of the second index; reconstructing on the basis of the third index the excitation signal formed by using a pulse train including a pulse selected from first pulses and second pulses, the first pulses being set on sampling points of the excitation signal, and the second pulses being set at a position between sampling points of the excitation signal; and generating a decoded speech signal by exciting a synthesis filter on the basis of the reconstructed excitation signal.
In other words, the present invention provides a speech coding/decoding method in which an excitation signal is constituted by a pitch vector and stochastic vector, and the stochastic vector is formed by using a pulse train containing a pulse selected from first pulses set on sampling points of the stochastic vector and second pulses set at positions located between sampling points of the stochastic vector.
According to the invention, there is provided a speech coding method which comprises: analyzing an input speech signal to divide the input speech signal into a parameter representing a frequency characteristic of a speech and an excitation signal formed based on the parameter and input to a digital filter, to output a first index specifying the parameter representing the frequency characteristic as a coded result, the excitation signal being generated by using a pitch vector and a stochastic vector for exciting a synthesis filter; selecting a predetermined number of pulse positions from pulse position candidates to be adapted on the basis of a shape of the pitch vector, the pulse position candidates including first pulse position candidates set on sampling points of the stochastic vector and second pulse position candidates set at positions located between sampling points of the stochastic vector; arranging pulses at the predetermined number of pulse positions to generate a pulse train to be used for generating the stochastic vector; generating a synthesized speech signal on the basis of the coded result and the excitation signal; generating a second index indicating a parameter with which an error between the input speech signal and the synthesized speech signal is minimized; selecting the pulse position candidates from a pulse position codebook in accordance with the second index; and outputting the first and second indexes.
According to the invention, there is provided a speech decoding method which comprises: extracting, from a coded stream, a first index indicting a frequency characteristic of a speech and a second index indicating an excitation signal; reconstructing a synthesis filter by decoding the first index; reconstructing the excitation signal on the basis of the second index, the excitation signal being constituted by a stochastic vector and a pitch vector, the stochastic vector being formed by a pulse train generated by arranging pulses at a predetermined number of pulse positions selected from pulse position candidates to be adapted on the basis of a shape of the pitch vector, and the pulse position candidates including first pulse position candidates and second pulse position candidates, the first pulse position candidates being set on sampling points of the stochastic vector and the second pulse position candidates being set at positions located between sampling points of the stochastic vector; and decoding a speech signal by exciting a synthesis filter by means of the excitation signal.
In other words, the present invention provides a speech coding/decoding method in which an excitation signal is constituted by a pitch vector and stochastic vector, and the stochastic vector is formed by using a pulse train generated by arranging pulses at a predetermined number of pulse positions selected from pulse position candidates subjected to adapting on the basis of the pitch vector. In this method, the pulse position candidates are formed by using a pulse train containing a pulse selected from the first pulses set on sampling points of the stochastic vector and the second pulses set at positions located between sampling points of the stochastic vector.
According to CELP scheme using an algebraic codebook, the number of pulse position candidates is limited to the number of sampling points of an excitation signal/stochastic vector or less. In contrast to this, according to the present invention, an infinite number of pulse position candidates can be theoretically set by adding positions between sampling points to the above sampling points. As a consequence, many coded bits can be assigned to pulse position candidates regardless of the number of samples. This makes it possible to improve the sound quality of a decoded speech signal and coding efficiency.
According to the invention, there is provided a speech coding apparatus comprising: a speech analyzer section configured to analyze an input speech signal to divide the input speech signal into a parameter representing a frequency characteristic of a speech and an excitation signal which is an input signal of a synthesis filter generated based on the parameter, to output a first index specifying the parameter as a coded result; a pulse excitation section configured to generate a pulse train, as the excitation signal, which includes a pulse selected from first pulses and second pulses, the first pulses being set at first positions located on sampling points of the excitation signal and the second pulses being set at second positions located between sampling points of the excitation signal; a speech synthesizer section configured to generate a synthesized speech signal based on the coded result and the excitation signal; an index output section configured to generate a second index indicating a parameter with which an error between the input speech signal and the synthesized speech signal is minimized; a pulse position codebook which stores pulse position candidates; a selector section which selects a pulse position candidate from the pulse position codebook in accordance with the second index; and an output section which outputs the first and second indexes.
According to the invention, there is provided a speech decoding apparatus comprising: a demultiplexer section that extracts, from a coded stream, a first index indicating a quantized value, a second index indicating a pitch vector, and a third index indicating a pulse train of an excitation signal; a dequantizer section which reconstructs the quantized value by decoding the first index; a pitch vector reconstructing section which reconstructs the pitch vector based on the second index; an excitation signal reconstructing section which reconstructs the excitation signal formed by using a pulse train including a pulse selected from first pulses and second pulses, the first pulses being set on sampling points of the excitation signal, and the second pulses being set at positions located between sampling points of the excitation signal on the basis of the third index; and a coding section which generates a decoded speech signal by exciting a synthesis filter by means of the reconstructed excitation signal and pitch vector.
Additional objects and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objects and advantages of the invention may be realized and obtained by means of the instrumentalities and combinations particularly pointed out hereinafter.