1. Field of the Invention
The present invention relates to a speech coding apparatus used for digital wire communication or radio communication of a speech signal to encode the speech signal according to prescribed algorithm, and particularly to a speech coding apparatus capable of transmitting non-speech signals in a voice frequency band such as DTMF (Dual Tone Multi-Frequency) signals and PB (Push Button) signals.
2. Description of Related Art
Reduction in communication cost is required in intra-corporate communications. To implement low bit rate transmission of speech signals that occupy a considerable portion of communication traffic, an increasing number of systems employ speech coding/decoding schemes typified by speech coding at 8-kbit/s CS-ACELP (Conjugate-Structure Algebraic-Code-Excited Linear Prediction) based on ITU-T recommendation G.729 described in “ITU-T Recommendation G.729 Coding of Speech at 8-kbit/s using Conjugate-Structure Algebraic-Code-Excited Linear Prediction(CS-ACELP)” (Published by International Telecommunication Union).
Speech coding methods such as the 8-kbit/s CS-ACELP whose transmission rate is 8 kbit/s or so reduce the amount of information after coding under the assumption that the input signals are a speech signal and by making use of the characteristics of the speech signal to obtain high quality speech with a small amount of information.
FIG. 27 is a block diagram showing a configuration of a first conventional speech coding apparatus employing the 8-kbit/s CS-ACELP; and FIG. 28 is a block diagram showing a configuration of the LSP quantizer and LSP quantization codebook of FIG. 27.
In FIG. 27, the reference numeral 201 designates a pre-processing section for carrying out pre-processing such as scaling and high-pass filtering of an input signal; 202 designates a linear prediction analyzer for calculating linear prediction (LP) coefficients from the input signal according to the linear prediction, and for converting the LP coefficients to line spectral pair (LSP) coefficients; 203 designates an LSP quantizer for selecting quantized samples corresponding to the LSP coefficients by referring to an LSP quantization codebook 204; and 204 designates the LSP quantization codebook including the quantized samples (LSP samples) of the LSP coefficients to which codebook indices are assigned.
The reference numeral 205 designates an LSP inverse-quantizer for computing the LSP coefficients corresponding to the codebook indices by referring to the LSP quantization codebook 204; 206 designates an LSP-to-LPC converter for converting the LSP coefficients to the LP coefficients; 207 designates a synthesis filter for synthesizing a speech signal by filtering using the LP coefficients generated by the LSP-to-LPC converter 206; 208 designates a subtracter; 209 designates a perceptual weighting filter for reducing noise offensive to the ear by handling noise components due to quantization errors in response to the frequency distribution of the speech signal; and 210 designates a distortion minimizing section for minimizing the mean-squared error of the speech signal passing through the weighting by the perceptual weighting filter 209, by comparing the synthesized speech signal from the synthesis filter 207 with the input speech signal.
The reference numeral 211 designates an adaptive codebook for storing a past excitation signal sequence for computing considerably long term components (from about 18 to 140 samples) of the speech signal; 212 designates a noise codebook for storing a plurality of random pulse trains; 213 designates a gain codebook for storing a plurality of gain parameters; 214, 215 and 216 each designate a multiplier; 217 designates a gain predictor for supplying the multiplier 215 with coefficients for regulating the amplitude of the noise; 218 designates an adder; and 219 designates a multiplexer for multiplexing the codebook indices of the selected LSP samples and the codebook indices of the coding parameters selected by the coded distortion minimizing section 210.
In FIG. 28, the reference numeral 301 designates a first stage LSP codebook for storing a plurality of prescribed quantization LSP coefficients extracted from a lot of speech data by learning; 302 designates a second stage LSP codebook for storing a plurality of prescribed quantization LSP coefficients used for fine adjustment; and 303 designates an MA prediction coefficient codebook for storing a predetermined number of sets of MA (Moving Average) prediction coefficients.
The reference numeral 311 designates an adder; 312 designates a multiplier; 313 designates an MA prediction component calculating section for computing MA prediction components by multiplying a predetermined number of past outputs of the adder 311 by one of the sets of the MA prediction coefficients; 314 designates an adder; 315 designates a subtracter for computing the quantization errors of the LSP coefficients by subtracting the LSP coefficients that are computed from the coefficients of the LSP quantization codebook 204 from the LSP coefficients fed from the linear prediction analyzer 202; 316 designates a quantization error weighting coefficient calculating section for computing, using the LSP coefficients of respective orders, the weighting coefficients to be multiplied by the quantization error signal of the LSP coefficients output from the subtracter 315; and 317 designates a distortion minimizing section for searching the codebooks 301, 302 and 303 for combinations of such quantized samples as minimizing the power of the quantization error signal passing through the weighting using the coefficients computed by the quantization error weighting coefficient calculating section 316, and for outputting the codebook indices corresponding to the samples selected.
Next, the operation of the first conventional speech coding apparatus will be described.
The input speech signal is subjected to the pre-processing such as scaling by the pre-processing section 201, and then supplied to the linear prediction analyzer 202 and subtracter 208.
The linear prediction analyzer 202 computes the LP coefficients from the input signal according to the linear prediction, followed by converting the LP coefficients to the LSP coefficients to be supplied to the LSP quantizer 203.
Referring to the LSP quantization codebook 204, the LSP quantizer 203 selects the LSP samples corresponding to the LSP coefficients, and outputs their codebook indices. In this case, as shown in FIG. 28, the adder 311 of the LSP quantizer 203 adds the coefficients from the first stage LSP codebook 301 to those from the second stage LSP codebook 302 in the LSP quantization codebook 204, and supplies the sums to the multiplier 312 and MA prediction component calculating section 313. Besides, the MA prediction coefficient codebook 303 of the LSP quantization codebook 204 supplies the MA prediction coefficients to the multiplier 312 and MA prediction component calculating section 313. The multiplier 312 multiplies the output of the adder 311 by the MA prediction coefficients, and supplies the products to the adder 314. The MA prediction component calculating section 313 stores a predetermined number of past outputs of the adder 311 and the MA prediction coefficients, calculates the sums of the products of the outputs of the adder 311 and the MA prediction coefficients at the respective time points, and supplies them to the adder 314. The adder 314 calculates the sums of the input values, and supplies them to the subtracter 315. The subtracter 315 subtracts the output of the adder 314 (that is, the LSP coefficients obtained from the LSP quantization codebook 204) from the LSP coefficients fed from the linear prediction analyzer 202, and supplies the quantization error signal of the LSP coefficients to the distortion minimizing section 317. The distortion minimizing section 317 multiplies the quantization error signal of the LSP coefficients by the weighting coefficients fed from the quantization error weighting coefficient calculating section 316, and computes their square sum. Then, it searches the codebooks 301, 302 and 303 for the LSP coefficients that will minimize the square sum, and outputs the codebook indices corresponding to the selected LSP coefficients. As for the detail of the operation, it is described in “Quantization Method of LSP Coefficients and Gain of CS-ACELP”, by Kataoka, et. al., pp.331–336, NTT R&D Vol.45, No.4, 1996. Thus, the spectrum envelope of the speech signal is quantized efficiently.
The LSP codebook indices selected by the LSP quantizer 203 are supplied to the multiplexer 219 and the LSP inverse-quantizer 205.
In response to the codebook indices supplied, and ref erring to the LSP quantization codebook 204, the LSP inverse-quantizer 205 generates the LSP coefficients, and supplies them to the LSP-to-LPC converter 206. The LSP-to-LPC converter 206 converts the LSP coefficients to the LP coefficients, and supplies them to the synthesis filter 207.
On the other hand, the adaptive codebook 211 stores long term components of a plurality of excitation vectors (pitch period excitation vectors), and the noise codebook 212 stores noise components of the plurality of excitation vectors. The codebooks each output one vector, and the adder 218 adds the two vectors (long term component and noise component), and supplies the resultant excitation vector to the synthesis filter 207.
The synthesis filter 207 generates a speech signal by filtering the excitation vector with a filtering characteristic based on the LP coefficients fed from the LSP-to-LPC converter 206, and supplies the speech signal to the subtracter 208.
The subtracter 208 subtracts the synthesized speech signal from the input speech signal after the pre-processing, and supplies the errors between them to the perceptual weighting filter 209. The perceptual weighting filter 209 regulates the filter coefficients adaptively in response to the spectrum envelope of the input speech signal, carries out the filtering of the speech signal error, and supplies the errors after the filtering to the distortion minimizing section 210.
The distortion minimizing section 210 repeatedly selects the long term components of the excitation vectors output from the adaptive codebook 211, the noise components of the excitation vectors output from the noise codebook 212 and gain parameters output from the gain codebook 213, calculates the errors between the synthesized speech signal and the input speech signal, and supplies the multiplexer 219 with the codebook indices of the adaptive codebook, noise codebook and gain codebook that will minimize the mean-squared error.
The multiplexer 219 multiplexes the codebook indices of the LSP samples with the codebook indices of the adaptive codebook, noise codebook and gain codebook, and transmits them through the transmission line.
In this way, according to the CELP, the first conventional speech coding apparatus generates time sequential signals as the voice source corresponding to human vocal cords in response to the coding parameters stored in the codebooks 211, 212 and 213, and drives the synthesis filter 207 (linear filter corresponding to the voice spectrum envelope) that models human vocal tract information by the signal, thereby reproducing the speech signal to select optimum coding parameters, the detail of which is described in “Basic Algorithm of CS-ACELP”, by Kataoka, et. al., pp. 325–330, NTT R&D Vol.45, No.4, 1996.
As described above, the LSPs (line spectral pairs) are widely used for the method of expressing the spectrum envelope of the speech signal in the conventional speech coding apparatus that compresses and codes the speech signal into a low bit rate speech signal efficiently. The CS-ACELP system also utilizes the LSP coefficients as the frequency parameters for transmitting the speech spectrum envelope, the detail of which is described in “Speech Information Compression By Line Spectral Pair (LSP) Speech Analysis and Synthesis”, by Sugamura and Itakura, pp.599–606, the Journal of the Institute of Electronics and Communication Engineers of Japan, 81/08 Vol. J64-A, No.8.
Thus, the foregoing conventional speech coding apparatus, which calculates the moving average prediction of the LSP codebook coefficients using the MA prediction coefficients, can quantize the LSP coefficients of the signal with little variations in frequency characteristics, that is, the signal having large correlation between frames. In addition, it can express the contour of the spectrum envelope of the speech signal by using the first stage LSP codebook based on learning in combination with the second stage LSP codebook based on random number, although it lacks mathematical precision. In addition, using the second stage codebook based on the random number makes it possible to flexibly follow slight variations in the spectrum envelope. Accordingly, the foregoing conventional speech coding apparatus can encode the characteristics of the spectrum envelope of the speech signal efficiently.
However, using the coding algorithm specialized for speech, the speech coding apparatus will degrade the transmission characteristics of signals other than the speech signal in the voice frequency band, such as DTMF (dual tone multi-frequency) signals output from a push-button telephone, No.5 signaling and modem signals.
The non-speech signal, particularly the DTMF signals has the following characteristics: (1) Their spectrum envelopes differ markedly from those of the speech signal; (2) The spectrum characteristics and gain little vary during the signal burst, but the spectrum characteristics change sharply between the signal burst and pause; (3) Since the quantization distortion of the LSP coefficients directly affects the frequency distortion of the DTMF signals, the LSP quantization distortion should be reduced as much as possible.
Thus, it is difficult for the conventional speech coding apparatus to code the non-speech signals like the DTMF signals with such characteristics. In particular, in a low bit rate transmission, the redundancy is small, and hence it is inappropriate for the non-speech signals to make use of the same scheme as the speech signal.
Incidentally, the intracorporate communications usually do not have a signal line dedicated for signaling for a call connection in the telephone communication, but make use of in-channel signaling transmission of the DTMF signals. In this case, when the transmission line assigned utilizes the above-described low bit rate speech coding, the transmission characteristics of the DTMF signals will be degraded, thereby bringing about erroneous call connections at a rather high probability.
To solve such a problem, a second conventional speech coding apparatus is proposed by Japanese patent application laid-open No.9-81199/1997, for example. FIG. 29 is a block diagram showing a configuration of the second conventional speech coding apparatus. In FIG. 29, the reference numeral 501 designates a conventional speech coding apparatus, and 502 designates a speech decoding apparatus for decoding the code generated by the speech coding apparatus 501.
In the speech coding apparatus 501, the reference numeral 511 designates a coder for encoding the speech signal; 512 designates a DTMF detector for detecting the DTMF signals from the input voice band signal; 513 designates a DTMF coding pattern memory for prestoring coding patterns corresponding to the DTMF signals; and 514 designates a selector switch.
In the speech decoding apparatus 502, the reference numeral 521 designates a decoder for decoding the code corresponding to the speech signal in the signal received via the transmission line, and for outputting the speech signal; 522 designates a DTMF coding pattern detector for detecting the coding pattern of the DTMF signals from the code received via the transmission line by referring to the DTMF coding pattern memory 523; 523 designates a DTMF coding pattern memory for prestoring the coding patterns corresponding to the DTMF signals; 524 designates a DTMF generator for generating the DTMF signals corresponding to the detected coding patterns; and 525 designates a selector switch.
Next, the operation of the second conventional speech coding apparatus will be described.
In the speech coding apparatus 501, the coder 511 encodes the input signal as a speech signal, and supplies it to the selector switch 514. The DTMF detector 512, detecting the DTMF signals from the input signal, supplies the DTMF coding pattern memory 513 with the types of the detected DTMF signals, and the selector switch 514 with the control signal for causing the selector switch 514 to select the output from the DTMF coding pattern memory 513.
Receiving the information about the types of the detected DTMF signals from the DTMF detector 512, the DTMF coding pattern memory 513 supplies the selector switch 514 with the code corresponding to the DTMF signals of the types.
When the DTMF signals are detected, the selector switch 514 selects the code from the DTMF coding pattern memory 513 in response to the control signal fed from the DTMF detector 512, and transmits the code via the transmission line. Otherwise, it selects the code fed from the coder 511, and transmits it through the transmission line.
In the speech decoding apparatus 502, on the other hand, the code received is supplied to the decoder 521 and the DTMF coding pattern detector 522. The decoder 521 decodes the code into the speech signal, and supplies it to the selector switch 525. On the other hand, the DTMF coding pattern detector 522 makes a decision as to whether the received code is the code of the DTMF signals or not by comparing it with the code corresponding to the DTMF signals stored in the DTMF coding pattern memory 523. When the received code is the code of the DTMF signals, the DTMF coding pattern detector 522 supplies the DTMF generator 524 with the types of the DTMF signals, and the selector switch 525 with the control signal for causing the selector switch 525 to select the signal from the DTMF generator 524.
When the code of the DTMF signals is detected, the selector switch 525 selects the DTMF signals fed from the DTMF generator 524 in response to the control signal from the DTMF coding pattern detector 522 and outputs them. Otherwise, it selects the speech signal fed from the decoder 521 and outputs it.
In this way, the second conventional speech coding apparatus detects the DTMF signals from the input voice band signal, and when the DTMF signals are detected, it outputs the prestored code corresponding to the DTMF signals, and when the DTMF signals are not detected, the coder 511 outputs the code it encodes.
As another technique to solve the foregoing problem, the assignee of the present invention proposed the speech coding apparatus disclosed in Japanese patent application laid-open No.11-259099/1999. FIG. 30 is a block diagram showing a configuration of the speech coding apparatus proposed therein; and FIG. 31 shows a speech decoding apparatus for decoding the code generated by the speech coding apparatus as shown in FIG. 30.
In FIG. 30, the reference numeral 601 designates a coder comprising a coding function block 611 for coding the speech signal, and a coding function block 612 for coding the non-speech signal; 602 designates a speech/non-speech signal discriminator for deciding as to whether the input signal is a speech signal or a non-speech signal, and outputs the decision result; 603 and 604 each designate a selector switch; and 605 designates a multiplexer for multiplexing the decision result from the speech/non-speech signal discriminator 602 and codewords from the coder 601, to be transmitted through the transmission line.
In FIG. 31, the reference numeral 651 designates a demultiplexer for demultiplexing the signals multiplexed by the multiplexer 605, that is, the decision result of the speech/non-speech signal discriminator 602 and the codewords output from the coder 601; 652 designates a decoder comprising a decoding function block 661 for decoding the codewords of the speech signal, and a decoding function block 662 for decoding the codewords of the non-speech signal; and 653 and 654 each designate a selector switch.
Next, the operation of the third conventional speech coding apparatus will be described.
In the speech coding apparatus as shown in FIG. 30, the speech/non-speech signal discriminator 602 always monitors the input signal to make a decision at to whether it is a speech signal or a non-speech signal, and from the decision result, it decides the operation mode of the coder 601. When the speech/non-speech signal discriminator 602 makes a decision that the input signal is the speech signal, it controls the selector switches 603 and 604 so that the coding function block 611 for the speech signal codes the input signal, whereas when it makes a decision that the input signal is the non-speech signal, it controls the selector switches 603 and 604, so that the coding function block 612 for the non-speech signal codes the input signal.
The multiplexer 605 multiplexes the codewords generated by the speech signal coding function block 611 or the non-speech signal coding function block 612 in the coder 601 with the decision result of the speech/non-speech signal discriminator 602, to be transmitted through the transmission line.
In the speech decoding apparatus as shown in FIG. 31, the demultiplexer 651 demultiplexes the signal received via the transmission line into the codewords generated by the coder 601 and the decision result by the speech/non-speech signal discriminator 602, and supplies the decision result to the selector switches 653 and 654, and the codewords to the decoder 652.
When the decision result indicates the speech signal, the selector switches 653 and 654 select the speech signal decoding function block 661 to decode the received codewords. In contrast, when the decision result indicates the non-speech signal, the selector switches 653 and 654 select the non-speech signal decoding function block 662 to decode the received codewords. The decoded speech signal or non-speech signal is output from the decoder 652.
In this way, the system can transmit the speech signal and non-speech signal via the same transmission line without changing the transmission rate and with maintaining the speech quality as much as possible.
However, it is sometimes difficult for the intracorporate communication system, which installs the speech coding apparatus on the transmission side and the speech decoding apparatus on the receiving side, to simultaneously replace the apparatuses on both the transmission side and receiving side by new apparatuses because of various reasons such as cost or management in the company.
With the foregoing arrangements, the conventional speech coding apparatus such as the intracorporate communication system (a communication system for multiplexing multimedia, for example) installing a speech codec according to the CS-ACELP based on the ITU-T recommendation G.729 has the following problem. To achieve the in-channel transmission of the DTMF signals, the speech coding apparatus on the transmission side must be replaced by the speech coding apparatus that can transmit the non-speech signal well. However, it offers a problem in that the speech decoding apparatus on the receiving side, which remains conventional, cannot receive the non-speech signal satisfactorily.