The present invention relates to a voice decoding apparatus and, more particularly, to a voice decoding equipment for decoding received signals encoded by an adaptive differential PCM (ADPCM) scheme.
It is said that the voice activity factor in voice communications is about 35%.
In recent years, there has become widespread what is personal communications intended primarily for private use. The mainstream of such personal communications is voice communications which utilize a terminal equipment handy to carry. A first requirement of such a portable terminal equipment is to be cordless. A second requirement is to cut the circuit power consumption so as to lengthen the service life of the battery that is used for the convenience of portability of the terminal equipment.
Taking into account the voice activity factor, it is possible to reduce the circuit power consumption more than in the past by activating the transmitter circuit for only the voice-active duration and holding it in the sleep mode during the other transmitting operations. This could be implemented by providing a voice activity detecting facility at the sending side and adding a discontinuous transmitter in association therewith.
This, however, creates a problem at the receiving side as the reconstructed voice is discontinuous, and hence is very annoying. As is well-known in the art, a reason for this is that a background noise is transmitted as a modulated signal for only the duration of a sound signal; that is, when the voice is send, the background noise is superimposed on the voice but when no voice is sent, the background noise is not sent either.
A known solution to this problem is to generate, at the receiving side, a pseudo-noise similar to the background noise at the sending side while no voice signal is transmitted.
FIG. 3 illustrates in block form a conventional voice decoding apparatus employing an example of such a scheme as mentioned just above. In FIG. 3, reference numeral 1 denotes an antenna, and 2 a receiving demodulator, which receives and demodulates a modulated wave having multiplexed therein a voice activity detection flag indicating the voice activity of nonactivity detected by a voice activity detector of an encoder at the sending side and ADPCM coding data.
Reference numeral 3 denotes a demultiplexer for demultiplexing the received and demodulated signal a into the ADPCM coding data q and the voice activity detection flag b. Reference numeral 5 denotes a controller, which receives the voice activity detection flag b from the demultiplexer 3 and outputs a control signal (a reset pulse) d and a control signal e indicating the voice activity nonactivity.
Reference numeral 4 denotes an ADPCM decoder, in which a prediction coefficient is reset to "0" by the control signal d from the controller 5 and which decodes the APDCM coding data c for the voice-active duration but, for the voice-nonactive duration, decodes random ADPCM coding data generated therein for pseudo-background noise by a prediction coefficient that is input from a prediction coefficient holder 7.
Reference numeral 7 denotes a prediction coefficient holder, which extracts prediction coefficients which are internal variables of the ADPCM decoder 4, then calculates their average value for each frame, updates the prestored value and, for the voice-nonactive duration, responds to the control signal e from the controller 5 to provide the prediction coefficient of the frame immediately preceding it to the ADPCM decoder 4 while holding the prediction coefficient. Reference numeral 8 denotes a speaker.
FIG. 4 is a timing chart showing a signal waveform of the original sound at the sending side and signal waveforms occurring at respective parts of the voice decoding apparatus.
The uppermost row shows the waveform of the original sound at the sending side; the voice and the background noise are superimposed on each other. The second row shows the voice activity detection flag b sent from the sending side and demultiplexed by the demultiplxer 3; the flag is "1" for the voice-active duration and "0" for the voice-nonactive duration. In this example, the flag is shown to erroneously indicate a voice activity for a short time in the voice-nonactive duration. The third row shows the reset signal d which is provided from the controller 5.
Reference character x at the fourth row denotes a waveform showing temporal variations of a second-order one al(t) of prediction coefficients for the original sound (voice+background noise). Reference character j at the next row denotes a waveform showing temporal variations of a prediction coefficient al(t) of the ADPCM decoder 4; the waveform during the voice-active state is the same as the waveform x, but the waveform during the voice-nonactive state shows temporal changes of the prediction coefficient which is input into the decoder from the prediction coefficient holder 7 to decode the pseudo-background noise.
The operation of the conventional equipment will be described with reference to FIGS. 3 and 4.
The modulated wave with the voice activity detection flag and the ADPCM coding data multiplexed therein is received by the antenna 1 and then fed to the demodulator 2, which demodulates it and applies the demodulated signal a to the demultiplexer 3. The voice activity detection flag herein mentioned is the output signal from the voice activity detector, which detects a period of the input from the encoder of the sending side over which a voice is present (voice-active) or absent (voice non-active).
The demultiplexer 3 demultiplexes the modulated signal into the voice activity detection flag b and the ADPCM coding data c. In this instance, the ADPCM coding data c has a 5-milli-second long frame. For each 5 milli-second the voice activity detection flag b is fed to the controller 5.
The controller 5 receives the voice activity detection flag b and provides the control signal (a reset pulse) d to the ADPCM decoder 4 at the time of transition from the voice-active duration to the voice-nonactive duration and vice versa. The control signal d is to initialize a predetermined variable in the ADPCM decoder 4, such as a prediction coefficient; in this example, the control signal is applied to the decoder to reset it for each transition to put the receiving side decoder into the same internal state as that of the sending side encoder when the transmission is interrupted and resumed in response to the voice activity detection. Without this resetting of the decoder, its internal state would differ from that of the encoder after the interruption of the transmission on the basis of the voice activity detection. This results in the degradation of the tone quality of the reconstructed voice decoded by the decoder. The ADPCM decoder 4 is reset by the control signal d. Since no modulated signal is received during the voice-nonactive period, the ADPCM decoder 4 generates therein random data over a range permissible for the ADPCM coding data c so as to generate pseudo-background noise and decodes the data by the prediction coefficient which is fed from the prediction coefficient holder 7.
To add spectrum information of the actual noise to the pseudo-background noise generated in the ADPCM decoder 4 during the interruption of the transmission, the prediction coefficient holder 7 extracts the prediction coefficients in the ADPCM decoder 4, calculates their average value, updates the old value with the new one and holds the new average value for each time and, upon detection of the voice-nonactive duration by the control signal e, adds the average value of the prediction coefficients of the last frame of the voice-active duration to the pseudo-background noise while holding it. By this, it is possible to generate a pseudo-background noise which, even if decoded using random data, has a timbre similar to that of the background noise actually superimposed on the encoded voice or sound.
As referred to previously, "x" in FIG. 4 shows, for example, temporal variations of the second-order one al(t) of the prediction coefficients for the original sound (voice+background noise).
When a period B in the voice-nonactive duration is processed as a voice-active duration by an error of the voice active detection flange due to a decision error of the voice activity detector or fading transmission medium, etc. as shown in the voice activity detection flag b in FIG. 4, the decoder 4 is reset at the end of the period A of "j" in FIG. 4. The prediction coefficient made "0" by the reset rises up to a value for the background noise over the period B. If the period B becomes the voice-nonactive duration again after an extremely short time (five frames, for instance), the decoder 4 generates the pseudo-background noise in the next period C (the voice-nonactive duration) while holding the old value of the prediction coefficient prior to its rising; hence, in the period C the pseudo-background noise becomes noise of a timbre different from that of the actual background noise, awakening a sense of incongruity in a listener. That is, the prior art has a shortcoming that the pseudo-background noises in the periods A and C differ in tone quality, creating a feeling of discomfort.