1. Field of the Invention
Embodiments herein are in the field of encoding background noise information in voice signal encoding methods.
2. Description of the Related Art
Since the beginnings of telecommunication, a limitation of bandwidth for analog voice transmission has been designated for telephone calls. Voice transmission occurs at a limited range of frequencies, from 300 Hz to 3400 Hz.
Such a limited range of frequencies is also designated in many voice signal encoding methods for present-day digital telecommunications. To this end, prior to any encoding procedure, a delimitation of the analog signal's bandwidth is performed. In the process, a codec is used for coding and decoding, which, because of the described delimitation of its bandwidth between 300 Hz and 3400 Hz, is also referred to as a narrow band speech codec in what follows. The term codec is understood to mean both the coding requirement for digital coding of audio signals as well as the decoding requirement for decoding data with the goal of reconstructing the audio signal.
A well-known narrow band speech codec, for example, is the ITU-T-recommendation G.729. The transmission of a narrow band speech signal having a data rate of 8 kbits/s is provided using the coding requirement described therein.
Moreover, so-called wide band speech codecs, which provide for encoding in an expanded frequency range for the purpose of improving the auditory impression, are known. Such an expanded frequency range lies, for example, between a frequency of 50 Hz and 7000 Hz. A well-known wide band speech codec is, for example, the ITU-T recommendation G.729.EV.
Customarily, encoding methods for wide band speech codecs are configured to be scalable. Scalability here is taken to mean that the transmitted encoded data contain various delimited blocks, which contain the narrow band portion, the wide band portion, and/or the full band width of the encoded speech signal. Such a scalable configuration permits, on the one hand, a downward compatibility on the part of the recipient and, on the other hand, it affords a simple opportunity, in the case of limited data transmission capacities in the transmission channel, to effect an adjustment of the data rate on the side of the transmitter and the recipient and the size of transmitted data frames.
To reduce the data transmission rate by means of a codec, provision is customarily made for a compression of the data to be transmitted. A compression is achieved, for example, by encoding methods in which parameters for an excitation signal and filter parameters are determined for encoding the speech data. The filter parameters as well as the parameter that specifies the excitation signal are then transmitted to the recipient. There, with the aid of the codec, a synthetic speech signal is synthesized, which resembles the original speech signal as closely as possible insofar as any subjective auditory impression is concerned. With the aid of this method, which is also referred to as the “analysis by synthesis” method, the samples that are established and digitized are not transmitted themselves, but rather the parameters that were ascertained, which render a synthesis of the speech signal possible on the recipient's side.
A method for discontinuous transmission, which is also known in the field as DTX, affords an additional measure for the reduction of the data transmission rate. The fundamental goal of DTX is a reduction of the data transmission rate when there is a pause in speaking.
To this end, the sender employs speech pause recognition (Voice Activity Detection, VAD), which recognizes a speech pause if a certain signal level is not met.
Customarily, the recipient does not expect complete silence during a speech pause. On the contrary, complete silence would lead to annoyance on the recipient's part or even to the suspicion that the connection had been disrupted. For this reason, methods are employed to produce a so-called comfort noise.
A comfort noise is a noise synthesized to fill phases of silence on the recipient's side. The comfort noise serves to foster a subjective impression of a connection that continues to exist without utilizing the data transmission rate that is provided for the purpose of transmitting speech signals. In other words, less energy is expended for the sender to encode the noise than to encode the speech data. To synthesize the comfort noise in a manner still perceived by the recipient as realistic, data are transmitted at a far lower data rate. The data transmitted in the process are also referred to within the field as SID (Silence Insertion Description).
Present scalable encoding methods for wide band speech codecs do not currently provide any methods for discontinuous transmission.
In the state of the art, there are problems with any application of a discontinuous transmission (DTX) in conjunction with a comfort noise generator (CNG) on the recipient's side.
Currently known methods of discontinuous transmission provide for a transmission SID frame with updated parameters to characterize the background noise only if significant changes in the energy of the background noise are detected by the encoder during an inactive speech period (speech pause). This pertains to both narrow band (50 Hz to 4 kHz) and to wide band speech codecs, which support methods for discontinuous transmission. Customarily, in the decision to transmit a SID frame with updated parameters, an energy threshold that is specified in the decoder is used. This leads to the situation that if the defined energy threshold is not exceeded no SID frames are sent. On the part of the transmission network between recipient and sender, however, such suspension of the sending of SID frames is seen as the state at rest, or “Idle Channel.” To ensure that a connection is maintained (“Connection Alive”), an additional exchange of data may be necessary to indicate that the connection is to be maintained.
A known, additionally provided data exchange occurs at present in that administrative points in the transmission network's network management call upon the sending node, i.e., the sending encoder, to send the most recently sent SID frame once more, in case the idle period to the most recently sent SID frame that elapsed is deemed to be too long for the connection in question. Parameters of the SID frame being sent again are not updated for such renewed transmission. The encoder, thus, does not perform any additional actions.