As a method for coding a speech signal at medium and low bit rates at a high efficiency, there is widely used a method for coding a speech signal by separating the speech signal into a linear prediction filter and a drive sound source signal (sound source signal) thereof.
CELP (Code Excited Linear Prediction) is one of the representative methods. In CELP, a synthesized speech signal (reproduction signal) is generated by driving a linear prediction filter set with a linear prediction coefficient calculated by subjecting input speech to a linear prediction analysis by a sound source signal represented as a sum of a signal representative of a pitch period of speech and a noise-like signal.
With regard to CELP, a description is given in M. R. Schroeder and Bishnu A tal “Code excited linear prediction (CELP): High quality speech at very low bit rates” (Proceeding of ICASSP, pp. 937-940, 1985) (Reference 1). Further, a coding performance with regard to a music signal can be improved by constructing CELP, mentioned above, by a band division constitution. According to the constitution, a reproduction signal is generated by driving a linear prediction synthesis filter by an excitation signal provided by adding sound source signals in correspondence with respective bands.
With regard to CELP having the band division constitution, a description is given in A. Ubale and Allen Gersho “Multi-band CELP Coding of Speech and Music” (Proceeding of IEEE Workshop on Speech Coding for Telecommunications, pp. 101-102, 1997 (Reference 2).
FIG. 1 is a block diagram showing an example of a conventional speech and music signal coder. Here, for simplicity, a number of bands is set to 2. An input signal (input vector) generated by sampling speech or music signals and summarizing a plurality of the samples in one vector as one frame, is inputted from an input terminal 10.
A linear prediction coefficient calculating circuit 170 is inputted with the input vector from the input terminal 10. The linear prediction coefficient calculating circuit 170 carries out a linear prediction analysis with regard to the input vector and calculates a linear prediction coefficient. Further, the linear prediction coefficient calculating circuit 170 quantizes the linear prediction coefficient and calculates a quantized linear prediction coefficient. The linear prediction coefficient is outputted to a weighting filter 140 and a weighting filter 141. An index in correspondence with the quantized linear prediction coefficient is outputted to a linear prediction synthesis filter 130, a linear prediction synthesis filter 131 and a code outputting circuit 190.
A first sound source generating circuit 110 is inputted with an index outputted from a first minimizing circuit 150. The first sound source generating circuit 110 reads a first sound source vector in correspondence with the index from a table stored with a plurality of sound source vectors and outputs the first sound source vector to a first gain circuit 160.
A second sound source generating circuit 111 is inputted with an index outputted from a second minimizing circuit 151. A second sound source vector in correspondence with the index is read from a table stored with a plurality of sound source vectors and is outputted to a second gain circuit 161.
The first gain circuit 160 is inputted with the index outputted from the first minimizing circuit 150 and the first sound source vector outputted from the first sound source generating circuit 110. The first gain circuit 160 reads a first gain in correspondence with the index from a table stored with a plurality of values of gains. Thereafter, the first gain circuit 160 multiplies the first gain by the first sound source vector and generates a third sound source vector and outputs the third sound source vector to a first band pass filter 120.
The second gain circuit 161 is inputted with the index outputted from the second minimizing circuit 151 and the second sound source vector outputted from the second sound source generating circuit 111. The second gain circuit 161 reads a second gain in correspondence with the index from a table stored with a plurality of values of gains. Thereafter, the second gain circuit 161 multiplies the second gain by the second sound source vector and generates a fourth sound source vector and outputs the fourth sound source vector to a second band pass filter 121.
The first band pass filter 120 is inputted with the third sound source vector outputted from the first gain circuit 160. A band of the third sound source vector is restricted to a first band by the filter to thereby generate a first excitation vector. The first band pass filter 120 outputs the first excitation vector to the linear prediction synthesis filter 130.
The second band pass filter 121 is inputted with the fourth sound source vector outputted from the second gain circuit 161. A band of the fourth sound source vector is restricted to a second band by the filter to thereby generate a second excitation vector. The second band pass filter 121 outputs the second excitation vector to the linear prediction synthesis filter 131.
The linear prediction synthesis filter 130 is inputted with the first excitation vector outputted from the first band pass filter 120 and an index in correspondence with the quantized linear prediction coefficient outputted from the linear prediction coefficient calculating circuit 170. The linear prediction synthesis filter 130 reads the quantized linear prediction coefficient in correspondence with the index from a table stored with a plurality of the quantized linear prediction coefficients. By driving the filter set with the quantized linear prediction coefficient by the first excitation vector, a first reproduction signal (reproduced vector) is generated. The first reproduced vector is outputted to a first differencer 180.
The linear prediction synthesis filter 131 is inputted with the second excitation vector outputted from the second band pass filter 121 and an index in correspondence with the quantized linear prediction coefficient outputted from the linear prediction coefficient calculating circuit 170. The linear prediction synthesis filter 131 reads the quantized linear prediction coefficient in correspondence with the index from a table stored with a plurality of quantized linear prediction coefficients. By driving the filter set with the quantized linear prediction coefficient by the second excitation vector, a second reproduced vector is generated. The second reproduced vector is outputted to a second differencer 181.
The first differencer 180 is inputted with the input vector via the input terminal 10 and is inputted with the first reproduced vector outputted from the linear prediction synthesis filter 130. The first differencer 180 calculates a difference between the input vector and the first reproduced vector. The difference is outputted to the weighting filter 140 and the second differencer 181 as a first difference vector.
The second differencer 181 is inputted with the first difference vector from the first differencer 180 and is inputted with the second reproduced vector outputted from the linear prediction synthesis filter 131. The second differencer 181 calculates a difference between the first difference vector and the second reproduced vector. The difference is outputted to the weighting filter 141 as a second difference vector.
The weighting filter 140 is inputted with the first difference vector outputted from the first differencer 180 and the linear prediction coefficient outputted from the linear prediction coefficient calculating circuit 170. The weighting filter 140 generates a weighting filter in correspondence with an auditory characteristic of human being by using the linear prediction coefficient and drives the above-described weighting filter by the first difference vector. By the above-described operation of the weighting filter 140, a first weighted difference vector is generated. The first weighted difference vector is outputted to the first minimizing circuit 150.
The weighting filter 141 is inputted with the second difference vector outputted from the second differencer 181 and the linear prediction coefficient outputted from the linear prediction coefficient calculating circuit 170. The weighting filter 141 generates a weighting filter in correspondence with the auditory characteristic of human being by using the linear prediction coefficient and drives the above-described weighting filter by the second difference vector. By the above-described operation of the weighting filter 141, a second weighted difference vector is generated. The second weighted difference vector is outputted to the second minimizing circuit 151.
The first minimizing circuit 150 successively outputs indexes in correspondence with all of the first sound source vectors stored in the first sound source generating circuit 110 to the first sound source generating circuit 110 and successively outputs indexes in correspondence with all of the first gains stored in the first gain circuit 160 to the first gain circuit 160. Further, the first minimizing circuit 150 is successively inputted with the first weighted difference vector outputted from the weighting filter 140. The first minimizing circuit 150 calculates a norm thereof. The first minimizing circuit 150 selects the first sound source vector and the first gain to minimize the norm and outputs an index in correspondence with these to the code outputting circuit 190.
The second minimizing circuit 151 successively outputs indexes in correspondence with all of the second sound source vectors stored in the second sound source generating circuit 111 to the second sound source generating circuit 111 and successively outputs indexes in correspondence with all of the second gains stored in the second gain circuit 161 to the second gain circuit 161. Further, the second minimizing circuit 151 is successively inputted with the second weighted difference vector outputted from the weighting filter 141. The second minimizing circuit 151 calculates a norm thereof. The second gain circuit 161 selects the second sound source vector and the second gain to minimize the norm and outputs an index in correspondence with these to the code outputting circuit 190.
The code outputting circuit 190 is inputted with an index in correspondence with the quantized linear prediction coefficient outputted from the linear prediction coefficient calculating circuit 170, inputted with indexes outputted from the first minimizing circuit 150 in correspondence with respectives of the first sound source vector and the first gain and inputted with indexes outputted from the second minimizing circuit 151 in correspondence with respectives of the second sound source vector and the second gain. The code outputting circuit 190 converts the respective indexes into codes of bit series and outputs the respective indexes after conversion via an output terminal 20.
FIG. 2 is a block diagram showing an example of a conventional speech and music signal decoding apparatus. A code inputting circuit 310 is inputted with a code in a bit series from an input terminal 30.
The code input circuit 310 converts the code in the bit series inputted from the input terminal 30 into indexes. An index in correspondence with a first sound source vector is outputted to a first sound source generating circuit 110. An index in correspondence with a second sound source vector is outputted to a second sound source generating circuit 111. An index in correspondence with a first gain is outputted to a first gain circuit 160. An index in correspondence with a second gain is outputted to a second gain circuit 161. An index in correspondence with a quantized linear prediction coefficient is outputted to a linear prediction synthesis filter 130 and a linear prediction synthesis filter 131.
The first sound source generating circuit 110 is inputted with the index outputted from the code inputting circuit 310. The first sound source generating circuit 110 reads the first sound source vector in correspondence with the index from a table stored with a plurality of sound source vectors and outputs the sound source vector to the first gain circuit 160.
The second sound source generating circuit 111 is inputted with the index outputted from the code inputting circuit 310. The second sound source generating circuit 111 reads the second sound source vector in correspondence with the index from a table stored with a plurality of sound source vectors and outputs the second sound source vector to the second gain circuit 161.
The first gain circuit 160 is inputted with the index outputted from the code inputting circuit 310 and the first sound source vector outputted from the first sound source generating circuit 110. The first gain circuit 160 reads a first gain in correspondence with the index from a table stored with a plurality of values of gains. The first gain circuit 160 generates a third sound source vector by multiplying the first gain by the first sound source vector. The third sound source vector is outputted to a first band pass filter 120.
The second gain circuit 161 is inputted with the index outputted from the code inputting circuit 310 and the second sound source vector outputted from the second sound source generating circuit 111. The second gain circuit 161 reads a second gain in correspondence with the index from a table stored with a plurality of values of gains. Thereafter, the second gain circuit 161 generates a fourth sound source vector by multiplying the second gain by the second sound source vector. The fourth sound source vector is outputted to a second band pass filter 121.
The first band pass filter 120 is inputted with the third sound source vector outputted from the first gain circuit 160. A band of the third sound source vector is restricted to a first band by the filter and the third sound source vector generates a first excitation vector. The first band pass filter 120 outputs the first excitation vector to the linear prediction synthesis filter 130.
The second band pass filter 121 is inputted with the fourth sound source vector outputted from the second gain circuit 161. A band of the fourth sound source vector is restricted to a second band by the filter and accordingly, the second band pass filter 121 generates a second excitation vector. The second band pass filter 121 outputs the second excitation vector to the linear prediction synthesis filter 131.
The linear prediction synthesizing vector 130 is inputted with the first excitation vector outputted from the first band pass filter 120 and the index in correspondence with the quantized linear prediction coefficient outputted from the code inputting circuit 310. The quantized linear prediction coefficient in correspondence with the index is read from a table stored with a plurality of quantized linear prediction coefficients. Thereafter, the linear prediction synthesis filters 130 generates a first reproduced vector by driving the filter set with the quantized linear prediction coefficient by the first excitation vector. The first reproduced vector is outputted to an adder 182.
The linear prediction synthesis filter 131 is inputted with the second excitation vector outputted from the second band pass filter 121 and the index in correspondence with the quantized linear prediction coefficient outputted from the code inputting circuit 310. The quantized linear prediction coefficient in correspondence with the index is read from a table stored with a plurality of quantized linear prediction coefficients. The linear prediction synthesis filter 131 generates a second reproduced vector by driving the filter set with the quantized linear prediction coefficient by the second excitation vector. The second reproduced vector is outputted to the adder 182.
The adder 182 is inputted with the first reproduced vector outputted from the linear prediction synthesis filter 130 and the second reproduced vector outputted from the linear prediction synthesis filter 131. A sum of these is calculated. The adder 182 outputs the sum of the first reproduced vector and the second reproduced vector as a third reproduced vector via an output terminal 40.
According to the above-described conventional speech and music signal coder, there is constructed the constitution in which the reproduction signal is generated by driving the linear prediction synthesis filters calculated from the input signal by the excitation signal provided by adding the excitation signal having a band characteristic in correspondence with a low region of the input signal and the excitation signal having a band characteristic in correspondence with a high region of the input signal and accordingly, a coding operation based on CELP is carried out in a band belonging to a high frequency region and accordingly, coding performance is deteriorated in the band belonging to the high frequency region and therefore, coding quality of the speech and music signal in all of bands is deteriorated.
The reason is that a signal in the band belonging to the high frequency region is provided with a property significantly different from speech and therefore, according to CELP modeling a procedure of generating speech, the signal in the band belonging to the high frequency region cannot be generated with a high accuracy.
It is an object of the invention to provide a speech and music signal coder capable of resolving the above-described problem and coding a speech and music signal over all of bands.