(a) Field of the Invention
The present invention relates to a voice encoder for generating natural background noise and, more particularly, to a voice encoder for use in a digital mobile communication system which performs a VOX Voice Operated Transmission) control. The present invention also relates to a voice encoding method.
(b) Description of the Related Art
In a digital mobile communication system, a VOX control is generally used which stops transmission of encoded signals for reduction of power dissipation when the input audio signal does not include voice in a frame. More specifically, when the communication system enters an unvoiced frame, the transmitting section of the communication system transmits a code series indicating the unvoiced frame instead of the encoded audio signals and the receiving section generates a background noise code series for a certain interval after receiving the signal thus transmitted. Such a communication system is described in, for example, JP-A-5(1993)-122165.
FIG. 1 shows a voice encoder of a conventional mobile communication system, such as mentioned above. The voice encoder comprises a voiced-unvoiced detector 11, a pitch analyzer 12, a high-efficiency encoder 13, a VOX unique word generator 14 and a data selector 15. Referring additionally to FIG. 2 showing a flowchart of the conventional voice encoder of FIG. 1, the operation of the conventional voice encoder of FIG. 1 will be described.
In a digital communication system using the high-efficiency voice encoding/decoding scheme, an input audio signal is divided into a plurality of frames each having a time period of about 40 milliseconds (msec). The input audio signal, divided into the frames, is supplied to the voiced-unvoiced detector 11, wherein it is judged whether or not the input audio signal includes voice for each frame (step B1).
If it is judged that the input audio signal includes voice in the present frame, pitch parameters, which characterize the voice of each frame together with a spectrum parameter, are extracted by the pitch analyzer 12 from the input audio signal (step B2). Pitch parameters or pitch information are described in "Digital Sound Processing" pp. 57-59, by Furui, Sep. 25, 1985, Tokai University Publication Association, for instance. The pitch information from the pitch analyzer 12 is supplied together with the input audio signal to the high-efficiency encoder 13, wherein the high-efficiency voice encoding is performed (step B3) together with extracting other parameters such as spectrum parameter. The data selector 15 selects, based on the information of the voiced state of the frame, the high-efficiency encoded signal as the data for transmission, which is transmitted to the receiving section or decoder (not shown in the figure) of the communication system.
On the other hand, if it is judged by the voiced-unvoiced detector 11 that the input audio signal does not include voice in the present frame whereas the input audio signal included voice in the precedent frame, the VOX unique word generator 14 generates a post-amble signal for the present frame (step B4), which is selected as the data for transmission by the data selector 15 and transmitted to the voice decoder (step B5). The input audio signal of the subsequent frame is encoded by the high-efficiency encoder 13 (step B3), as is the case of the voiced frame, and selected for data for transmission (step B5). The encoded signal for the subsequent frame is used for updating background noise in the decoder and referred to as a background updating code series.
The voice encoder then stops transmission of data for N frames, wherein N is a constant. If the unvoiced state continues for more than N frames, another post-amble signal and another background updating code series are transmitted after N frames elapsed, followed by stopping of transmission for additional N frames.
The voiced-unvoiced detector 11 continues detection of the voiced-unvoiced state of the input audio signal in each frame during the stopping of transmission by the voice encoder. If the voiced-unvoiced detector 11 detects a voiced frame of the input audio signal during the stopping of the transmission, the VOX unique word generator 14 generates a pre-amble signal for the frame, which is transmitted to the decoder through the data selector 15. The high-efficiency encoder 13 encodes the input audio signal from the next frame in the subsequent frames to generate high-efficient code series, which are successively transmitted to the decoder.
In the receiving section, the voice decoder decodes the received code series to regenerate parameters including the pitch parameters mentioned before, based on which it is judged whether or not the input audio signal of the present frame includes voice. If it is judged that the input audio signal of the present frame included voice, the voice decoder decodes the parameters to generate decoded audio signals. On the other hand, if a post-amble signal is received due to the unvoiced frame of the input audio signal, the voice decoder repeatedly generates background noise for N frames based on the parameters included in the background updating code series, the background noise being updated after each N frames based on a new post-amble signal and a new background updating code series.
JP-A-2(1990)-181800 also describes a related technique in a voice encoding/decoding system, wherein the amplitudes and the positions of multi-pulse are calculated by using a pitch predicting multi-pulse method in a voiced frame, whereas only the amplitudes of the multi-pulse are calculated, with the positions being fixed, in an unvoiced frame. It is recited that the technique achieves an excellent tone of the background noise even in the case of a low bit rate transmission.
JP-A-8(1996)-139688 also describes a related technique in a voice encoder for use in a mobile station, wherein output of the encoder is selected for generating background noise when the voiced-unvoiced detector detects an unvoiced frame. This technique is also capable of reducing a sense of incongruity of the voice output from the decoder and caused by the periodic tone variation in the background noise for an unvoiced frame during VOX (or VAD) processing by the mobile station.
JP-A-7(1995)-334197 also describes a related technique in a voice encoding/coding system, wherein background noise is generated in the decoder by interpolation of encoded data received intermittently, thereby preventing a sense of incongruity of decoded output even when the background noise is continuously decoded by the receiving section.
In the conventional voice encoders as mentioned above, the following problems exist in the output background noise in successive unvoiced frames.
The parameters for the input audio signal include pitch parameters or pitch components which features particular voice by representing a periodic vibration of vocal chords among human vocal mechanisms. The pitch clearly appears in voiced sound and does not appear in unvoiced sound. Accordingly, if a background noise is generated with the pitch parameters included in the parameters of the unvoiced frame, the resultant background noise involves an unnatural tone.