1. Field of the Invention
This invention relates to a speech coding and decoding system which includes a speech coding apparatus and a speech decoding apparatus and a speech signal transmission method for use with the system. More specifically, the present invention relates to a speech coding and decoding system which has a VOX (Voice Operated Transmitter) function by which data are transmitted only while a speaker is uttering.
2. Description of the Prior Art
In a speech coding and decoding system wherein speech of a speaker is coded by a speech coding apparatus and the coded data are transmitted to a speech decoding apparatus and then the data are decoded by and outputted from the speech decoding apparatus, a VOX function is provided frequently for the object of reduction of the power dissipation or effective utilization of the circuit band. This VOX function allows transmission of data from the coding apparatus side to the decoding apparatus side only within a speech burst period, that is, within a period within which the speaker is uttering. Within a pause period, that is, within a period within which no sound is inputted to the coding apparatus, the coding apparatus stops its transmission. Instead, on the decoding apparatus side, a kind of background noise is produced and outputted to eliminate unnatural speech communication which arises from the use of the VOX function.
As a speech coding and decoding system having such a VOX function as just described, a system is known and disclosed, for example, in Japanese Patent Laid-Open Application No. Heisei 5-122165 (JP, A, 5-122165) (Document 1) wherein, when a speech burst period is detected, a preamble signal is transmitted first and then coded data of speech are transmitted, but when a pause period is detected, a postamble signal is transmitted, whereas, on the decoding apparatus side, outputting of background noise is switchably started upon reception of the postamble signal.
In the following, a conventional speech coding and decoding system by digital radio transmission is described. FIG. 1 shows a construction of a speech coding apparatus, that is, a transmission side apparatus of a conventional speech coding and decoding system. In a digital radio transmission system, a speech signal inputted to the coding apparatus is cut out and processed for each data sequence called frame. The length in time of the frame is, for example, 40 ms.
A microphone 1 serving as an input terminal of a speech signal is connected to this speech coding apparatus 91. A transmission circuit 15 is connected to an output terminal of the speech coding apparatus 91, and a transmission antenna 11 is connected to the transmission circuit 15. The transmission circuit 15 is provided to convert an output signal of the speech coding apparatus 91 into a radio signal of a suitable frequency and transmit the radio signal from the transmission antenna 11 to the reception side.
In the speech coding apparatus 91, a speech signal inputted from the microphone 1 is inputted to a spectrum envelope analysis portion 2 for analyzing a spectrum envelope of the speech signal, a speech burst period detection portion 3 for discriminating whether or not the current frame is a speech burst period or a pause period, and a high efficiency coding portion 14 which executes high efficiency coding of the speech signal. An output of the spectrum envelope analysis portion 2 is connected to an input of the high efficiency coding portion 14 and also to an input of a spectrum coefficient quantization portion 6, and also an output of the spectrum coefficient quantization portion 6 is inputted to the high efficiency coding portion 14. A data switching portion 10 connected to the transmission circuit 15 is provided on an output of the high efficiency coding portion 14. Also a preamble production portion 8 and a postamble production portion 9 which constitute a unique word production portion 35 are connected to the data switching portion 10. The data switching portion 10 switches a signal to be transmitted from the transmission antenna 11 via the transmission circuit 15 or stops its transmission in response to a result of detection by the speech burst period detection portion 3 as hereinafter described. An output of the data switching portion 10 is supplied as an output of the speech coding apparatus 91 to the transmission circuit 15.
From a speech signal for one frame inputted to the speech coding apparatus 91, a spectrum envelope of the speech signal itself is analyzed and a spectrum coefficient is calculated by the spectrum envelope analysis portion 2. Here, the spectrum coefficient is a characteristic amount which characterizes the spectrum of a speech signal. For the spectrum coefficient, for example, a linear prediction coefficient (LPC) disclosed in Sadaoki FURUI, "Digital Speech Processing", the Publishing Society of Tokai University, Version 1, Sep. 25, 1985 (hereinafter referred to as "Document 2"), pp.60-62, a PARCOR (Partial Auto-correlation) coefficient disclosed similarly in Document 2, pp.73-78 or a LSP (Line Spectrum Pair) disclosed similarly in Document 2, pp.89-92 may be used.
The spectrum coefficient calculated by the spectrum envelope analysis portion 2 is inputted to and quantized by the spectrum coefficient quantization portion 6 to calculate a quantized spectrum coefficient. More particularly, the spectrum coefficient quantization portion 6 holds data produced in advance as a codebook and selects, from within the codebook, data which is discriminated to be nearest to the spectrum coefficient. The spectrum coefficient represented by the selected data is called quantized spectrum coefficient. In the following description, in order to assure clear distinction from a quantized spectrum coefficient, a spectrum coefficient not in a quantized situation outputted from the spectrum envelope analysis portion 2 is hereinafter referred to as "non-quantized spectrum coefficient". Further, a code word of the codebook which provides a quantized spectrum coefficient is referred to as "quantized spectrum code word".
The non-quantized spectrum coefficient and the quantized spectrum coefficient calculated in this manner are inputted together with the speech signal to the high efficiency coding portion 14, by which they are high efficiency coded, whereafter they are inputted to the data switching portion 10.
As described above, the speech signal for one frame inputted from the microphone 1 is inputted also to the speech burst period detection portion 3, by which it is discriminated whether the current frame is a speech burst period within which sound is issued or a pause period within which no sound is issued. A result of the discrimination by the speech burst period detection portion 3 is inputted to the data switching portion 10. If the discrimination is that the current frame is a speech burst period, then the data switching portion 10 selects the high efficiency code outputted from the high efficiency coding portion 14. As a result, the high efficiency code is transmitted via the transmission circuit 15 and the transmission antenna 11 toward the reception side, that is, toward the decoding apparatus side. A situation wherein the current frame is a speech burst period and high efficiency codes continue to be transmitted from the transmission antenna 11 is referred to as "speech burst processing state", and a high efficiency code or codes produced in a speech burst period are referred to as "speech burst code signal".
On the other hand, if the preceding frame is a speech burst period and it is discriminated by the speech burst period detection portion 3 that the current frame is a pause period, the following processing is effected. First, in the current frame, the postamble production portion 9 produces a frame called postamble signal and transmits the postamble signal from the transmission antenna 11 via the data switching portion 10. In the next frame, a speech signal of silence inputted from the microphone 1 is high efficiency coded by the high efficiency coding portion 14 in a similar manner as upon high frequency coding in a speech burst period, and the code is transmitted from the transmission antenna 11. The signal transmitted in this instance is referred to as "background noise updating signal". After the background noise updating code is transmitted, the coding apparatus side stops its transmission for a period of time interval of T frames. After the T frames, a postamble signal and a background noise updating signal are transmitted again, and then transmission is stopped for T frames. Such a sequence of operations is repeated. Here, T is a natural number determined in advance.
A situation wherein a sequence of operations that a postamble signal and a background noise updating signal are transmitted and then transmission is stopped for a period of T frames is repeated in this manner is referred to as "pause processing state". However, even in a pause processing state in which transmission is stopped, the speech burst period detection portion 3 always performs detection of a speech burst period, and if a speech burst is detected, then a frame called preamble signal is produced by the preamble production portion 8. Then, the preamble signal is transmitted from the transmission antenna 11 via the data switching portion 10, and in the following frames to the preamble signal, high efficiency codes produced by the high efficiency coding portion 14 are successively transmitted.
The postamble signal and the preamble signal are signals which are not normally produced by the high efficiency coding portion 14, and those postamble signal and preamble signal are collectively called "unique words".
FIG. 2 is a block diagram showing a construction of a speech decoding apparatus, that is, an apparatus on the reception side. The speech decoding apparatus 92 shown is used in pair with the speech coding apparatus 91 shown in FIG. 1.
A reception antenna 20 is connected to the speech decoding apparatus 92 via a reception circuit 33. The reception antenna 20 is provided to receive a signal transmitted from the speech coding apparatus 91 (FIG. 1). Further, in order to output decoded speech, a loudspeaker 30 is connected to the speech decoding apparatus 92.
In the speech decoding apparatus 92, a reception signal inputted from the reception antenna 20 via the reception circuit 33 is supplied to a high efficiency speech decoding portion 22 which effect high efficiency speech decoding, a unique word detection portion 23 as which detects a unique word, and a background noise parameter storage portion 24 which holds parameters necessary for production of background noise. The speech decoding apparatus 92 further includes a background noise synthesis portion 29 for synthesizing background noise, and a switch 21 for selectively outputting background noise outputted from the background noise synthesis portion 29 or decoded speech from the high efficiency speech decoding portion 22 to the loudspeaker 30. The speech decoding apparatus 92 further includes a quantized spectrum coefficient calculation portion 25 and a random residual signal generation portion 28.
The unique word detection portion 23 analyzes a reception signal and discriminates whether or not each of the current frame and the next frame is a speech burst period or a pause period. If the current frame is a pause period, then the unique word detection portion 23 detects a postamble signal, a preamble signal or a background noise updating signal. The detection method of a speech burst period/pause period by the unique word detection portion 23 is such as described below:
(1) If the preceding frame is a speech burst period and a signal other than the postamble signal is received in the current frame, then the current frame is a speech burst period; PA1 (2) If the preceding frame is a speech burst period and the postamble signal is received in the current frame, then the current frame is a pause period; PA1 (3) If the preceding frame is a pause period and a signal other than the preamble signal is received in the current frame, the current frame is a pause period; and PA1 (4) In spite of the three criteria (1) to (3) described above, if the preceding frame is a pause period and the preamble signal is received in the current frame, then the current frame is a pause period and the next frame becomes a speech burst period without fail. PA1 (a) If a signal which can be regarded as a postamble signal is received, then a postamble signal is detected whether or not the current frame is a speech burst period or a pause period; PA1 (b) If a signal which can be regarded as a preamble signal is received within a pause period, then a preamble signal is detected; PA1 (c) However, if a signal which can be regarded as a preamble signal is received within a speech burst period, then a speech burst code signal is detected; and PA1 (d) If, within a pause period, a postamble signal is detected in the preceding frame and a signal which can be regarded as a preamble signal is not received in the current frame, then a background noise updating signal is detected in the current frame.
Meanwhile, criteria when the unique word detection portion 23 detects a signal from within a reception signal are such as follows:
A detection output of the unique word detection portion 23 is supplied to the background noise parameter storage portion 24 and is supplied also to the switch 21 for switching of the switch 21. If it is discriminated by the unique word detection portion 23 that the current frame is a speech burst period, then the speech burst code signal is decoded by the high efficiency speech decoding portion 22. Then, the switch 21 is switched so that the decoded speech from the high efficiency speech decoding portion 22 may be outputted from the loudspeaker 30.
Next, operation when it is discriminated by the unique word detection portion 23 that the current frame is a pause period is described.
After it is discriminated that the current frame is a pause period, parameters are read out from the background noise parameter storage portion 24 first. From among the parameters read out, a quantized spectrum coefficient is inputted to the quantized spectrum coefficient calculation portion 25, by which it is converted into a quantized spectrum coefficient, whereafter it is inputted to the background noise synthesis portion 29. The remaining parameters are inputted, except that which corresponds to a residual signal, directly from the background noise parameter storage portion 24 to the background noise synthesis portion 29. The parameter corresponding to the residual signal is not inputted from the background noise parameter storage portion 24 to the background noise synthesis portion 29, but instead, a random residual signal generated by the random residual signal generation portion 28 is inputted to the background noise synthesis portion 29. From the inputs from the background noise parameter storage portion 24, quantized spectrum coefficient calculation portion 25 and random residual signal generation portion 28, the background noise synthesis portion 29 produces a background noise signal. Then, when it is discriminated by the unique word detection portion 23 that the current frame is a pause period, the switch 21 is switched so that the background noise signal produced by the background noise synthesis portion 29 is outputted from the loudspeaker 30.
The background noise parameter storage portion 24 is a memory for holding parameters necessary for synthesis of background noise. If it is discriminated by the unique word detection portion 23 that the reception signal of the current frame is a background noise updating signal, then the background noise updating signal is inputted to the background noise parameter storage portion 24. Consequently, contents of the background noise parameter storage portion 24 are updated to background noise parameters determined based on the background noise updating signal.
In the following, operation of the conventional speech coding and decoding system is described with reference to a flow chart. FIG. 3 illustrates processing of the speech coding apparatus 91 at the transmission site.
Assuming that a speech signal is inputted one after another frame, a spectrum envelope of the speech signal itself is analyzed by the spectrum envelope analysis portion 2 and a spectrum coefficient is calculated first in step 201. This spectrum coefficient (non-quantized spectrum coefficient) is then quantized, in step 202, by the spectrum coefficient quantization portion 6 so that a quantized spectrum coefficient is obtained.
The speech signal for one frame is inputted also to the speech burst period detection portion 3, and in step 203, it is discriminated by the speech burst period detection portion 3 whether or not the current frame is a speech burst period or a pause period. Then, based on the non-quantized spectrum coefficient, the quantized spectrum coefficient and the input speech signal, high efficiency coding is performed by the high efficiency coding portion 14 in step 204.
If it is discriminated in step 203 that the current frame is a speech burst period, then the control sequence advances to step 206, in which the data switching portion 10 selects the high efficiency code outputted from the high efficiency coding portion 14 and this high efficiency code is transmitted toward the decoding apparatus side by the transmission antenna 11.
On the other hand, if it is discriminated in step 203 that the current frame is a pause period, then processing by the unique word production portion 35, that is, the preamble production portion 8 and the postamble production portion 9, is performed in step 205. In particular, in the current frame, a postamble signal is produced by the postamble production portion 9, and in step 206, the postamble signal is transmitted from the transmission antenna 11 via the data switching portion 10. In the next frame, the speech signal of silence inputted from the microphone 1 is high efficiency coded by the high efficiency coding portion 14 in a similar manner as upon high efficiency coding for a speech burst period in step 204, and the resulting code is transmitted from the transmission antenna 11 in step 206.
After a background noise updating signal is transmitted, the speech coding apparatus 91 stops its transmission for a period of T frames which is a predetermined time interval. After the period of T frames passes, the speech coding apparatus 91 transmits a postamble signal and a background noise updating signal again and then stops its transmission for another period of T frames, and such a sequence of operations is repeated.
It is to be noted that, also while transmission is stopped, detection of a speech burst period in step 203 is successively performed, and if transition to a speech burst period from a pause period is detected, then a preamble signal is produced by the preamble production portion 8 included in the unique word production portion 35 in step 205. Then, in the current frame, the preamble signal is transmitted from the transmission antenna 11 via the data switching portion 10 in step 206. Then, in the following frames, high efficiency codes produced by the high efficiency coding portion 14 are successively transmitted in steps 204 and 206.
Next, processing by the speech decoding apparatus 92 at the reception site is described with reference to FIG. 4.
A reception signal transmitted from the coding apparatus and received by the reception antenna 20 is supplied to the high efficiency speech decoding portion 22 and the unique word detection portion 23 via the reception circuit 33. First in step 251, the reception signal is analyzed by the unique word detection portion 23 to discriminate whether or not each of the current frame and the next frame is a speech burst period or a pause period. If it is discriminated that both of the current frame and the next frame are pause periods, then it is discriminated in step 253 whether or not the reception signal is a unique word (that is, a postamble signal or a preamble signal). If the reception signal is not a unique word here, then it is discriminated in step 254 whether or not the reception signal is a background noise updating signal (data for updating of background noise). If it is discriminated in step 254 that the reception signal is a background noise updating signal, then contents of the background noise parameter storage portion 24 are updated in step 255.
If it is discriminated in step 251 that the current frame is a speech burst period, the high efficiency speech decoding portion 22 decodes the reception signal (in this instance, a high efficiency code) to produce a decoded speech signal in step 252, and the switch 21 is switched in step 259 so that the e decoded speech may be outputted from the loudspeaker 30. Then, the decoded speech signal is outputted.
Next, operation when it is discriminated in step 251 by the unique word detection portion 23 that the current frame is a pause period is described.
First, the processing in steps 253, 254 and 255 described above is executed. Then, in step 256, a quantized spectrum code word is read out from the background noise parameter storage portion 24 and inputted to the quantized spectrum coefficient calculation portion 25, by which it is converted into a quantized spectrum coefficient. Then, in step 257, the random residual signal generation portion 28 generates a random residual signal, and in step 258, the background noise synthesis portion 29 produces a background noise signal from the inputs from the background noise parameter storage portion 24, quantized spectrum coefficient calculation portion 25 and random residual signal generation portion 28. Since the current frame is a pause period, the switch 21 is switched to the background noise synthesis portion 29 side so that the background noise signal produced by the background noise synthesis portion 29 is outputted from the loudspeaker 30.
In the conventional speech coding and decoding system described above, however, the codebook provided in the spectrum coefficient quantization portion of the speech coding apparatus is generally optimized for quantization of a spectrum envelope in a speech burst period, but cannot be considered suitable for quantization of a pause period. Since the conventional system quantizes a spectrum envelope in a pause period using such a codebook optimized for a speech burst period as just described, background noise in a pause period gives rise to an unfamiliar feeling. After all, the conventional speech coding and decoding system has a problem in that background noise outputted from the speech decoding apparatus in a pause period makes unnatural sound.