1. Technical Field
The present invention relates generally to speech coding; and, more particularly, it relates to discontinued transmission and comfort noise generation within pulse code modulation (PCM) type of speech coders.
2. Related Art
Conventional methods of performing discontinued transmission (DTX) mode speech coding typically employs only energy level detection of background noise. That is to say, a single measure of the energy level is detected in an encoder circuitry of a speech codec, and an energy level flag is transmitted across a communication link to a decoder circuitry of the speech codec. At the decoder circuitry of the speech codec, some form of speech signal generation is performed after having received this energy level flag during the inception of discontinued transmission (DTX) modes of operation. Examples that are used to perform this comfort noise generation (CNG) in the art include utilizing a randomly selected or randomly generated sequence in a PCM coder (like the xcexc-Law/A-Law PCM G.711), and employing the randomly selected or the randomly generated codevector within a code-excited linear prediction (CELP) speech reproduction circuitry (like G.729 Annex B), to generate comfort noise at the decoder circuitry during discontinued transmission (DTX) modes of operation.
However, using this single dimensional method of encoding the background noise (energy level) of speech coding system fails to provide a high perceptual quality of reproduced background noise at the decoder circuitry of the speech codec. For example, the conventional method of employing the energy level alone simply does not provide the high perceptual quality of background noise that users of speech coding system expect.
One proposed method of ensuring a high perceptual quality of the coding of background noise in speech coding systems is to measure and transmit both a frequency spectrum and an energy level of a speech signal and transmit that information from the encoder circuitry to the decoder circuitry of the speech codec. One difficulty presented with the conventional methods that measure and transmit both the frequency spectrum and the energy level of the speech signal is that they inherently require a modification of the existing transmission protocols and standards. There is an inherent inability in such proposed solutions to be operable with the existing transmission protocols and standards. An entirely new silence insertion description (SID) standard would need to be designed to be able to interface with the conventionally proposed speech coding methods that are capable of ensuring a high perceptual quality of background noise within speech signals.
For example, the proposed conventional methods that measure and transmit both the frequency spectrum and the energy level of the speech signal inherently require the entirely new silence insertion description (SID) standard to be able to comply with and perform conventional speech coding operations such as discontinued transmission (DTX). To provide comfort noise generation (CNG) and other desirable speech coding methods that are operable to provide a high perceptual quality for applications such as speech coding of music, comfort noise generation (CNG), and other perceptual improvements that provide for increased quality for users would intrinsically require additional transformation to comply with existing speech coding standards. To provide this additional functionality, the inherently increased complexity of the overall speech coding system would result in a significant increase in size and cost. While there does exist a desire among those skilled in the art of speech coding, the presently conventional proposed methods, in that they do provide for improved perceptually quality of such speech signal elements such as background noise, they do not provide for operability with conventional transmission protocols, particularly those employing pulse code modulation (PCM).
Further limitations and disadvantages of conventional and traditional systems will become apparent to one of skill in the art through comparison of such systems with the present invention as set forth in the remainder of the present application with reference to the drawings.
Various aspects of the present invention can be found in a speech codec that performs discontinued transmission on a speech signal having a background noise. The speech codec contains, among other things, an encoder circuitry and a decoder circuitry communicatively coupled via a communication link. The encoder circuitry is operable to receive the speech signal having the background noise. The encoder circuitry itself contains, among other things, a background noise detection circuitry that detects a frequency spectrum and an energy level corresponding to the speech signal and a transmission resuming circuitry that operates cooperatively with the background noise detection circuitry to determine when to resume transmission of the speech signal. The decoder circuitry generates a reproduced speech signal that is substantially comparable to the speech signal. The decoder circuitry itself contains, among other things, a background noise reproduction circuitry that employs a predetermined number of relatively recently received speech samples to assist in the generation of a reproduced background noise that is itself contained within the reproduced speech signal. The reproduced background noise is substantially comparable to the background noise within the speech signal. The communication link is operable using a number of transmission protocols including conventional transmission protocols.
In certain embodiments of the invention, the background noise reproduction circuitry further contains a frequency spectrum derivation circuitry that re-synthesizes frequency spectrum for the reproduced speech signal and an energy level change derivation circuitry that re-synthesizes an energy level for the reproduced speech signal. The background noise detection circuitry further contains a frequency spectrum change detection circuitry that detects a change in the frequency spectrum corresponding to the speech signal, and an energy level change detection circuitry that a detects a change in the energy level corresponding to the speech signal. Furthermore, the encoder circuitry further contains an intelligent discontinued transmission circuitry that operates cooperatively with the background noise detection circuitry to detect the change in the frequency spectrum corresponding to the speech signal and the change in the energy level corresponding to the speech signal. This information is used to determine when to resume transmission of the speech coding on the speech signal.
In other embodiments of the invention, the encoder circuitry further contains a systematic discontinued transmission circuitry that resumes transmission of the speech coding on the speech signal at time intervals determined beforehand. The predetermined number of relatively recently received speech samples is a frame of the speech signal. The predetermined number of relatively recently received speech samples includes a frequency spectrum corresponding to the predetermined number of relatively recently received speech samples and an energy level corresponding to the predetermined number of relatively recently received speech samples.
Other aspects of the present invention can be found in a speech codec that performs an intelligent discontinued transmission speech coding on a speech signal. The speech codec contain, among other things, a speech signal analysis circuitry that calculates a predetermined number of parameters from the speech signal and a background noise detection circuitry that detects a change of at least one of the predetermined number of parameters that is calculated from the speech signal using the speech signal analysis circuitry. The speech codec resumes transmission of a speech coding on the speech signal upon the detection of the change of the at least one of the predetermined number of parameters.
In certain embodiments of the invention, the predetermined number of parameters from the speech signal comprises a frequency spectrum and an energy level of the speech signal. The change of the at least one of the predetermined number of parameters is detected when the background noise detection circuitry compares the change against a predetermined threshold.
If desired, the speech codec further contains an encoder circuitry, a decoder circuitry, and a communication link that communicatively couples the encoder circuitry and the decoder circuitry. The transmission of the speech coding on the speech signal, performed upon the detection of the change of the at least one of the predetermined number of parameters, is resumed across the communication link. The encoder circuitry further contains an intelligent discontinued transmission circuitry that operates cooperatively with the background noise detection circuitry to detect the change of the at least one of the predetermined number of parameters that is calculated from the speech signal using the speech signal analysis circuitry.
In other embodiments of the invention, the encoder circuitry further contains a systematic discontinued transmission circuitry that resumes transmission of the speech coding on the speech signal at predetermined time intervals. The speech signal comprises a background noise, and the speech codec produces a reproduced speech signal wherein the reproduced speech signal contains a reproduced background noise. The reproduced background noise is substantially indistinguishable from the background noise contained within the speech signal. The speech codec re-synthesizes the background noise using a predetermined number of speech samples corresponding to the speech signal, and the predetermined number of speech samples are a relatively recently sampled number of speech samples corresponding to the speech signal.
Other aspects of the present invention can be found in a method that performs discontinued transmission on a speech signal. The method includes discontinuing transmission of a speech signal, detecting a change in a frequency spectrum of the speech signal, detecting a change in a energy level of the speech signal, and resuming transmission of the speech signal upon detection of at least one of the change in the frequency spectrum of the speech signal and the change in the energy level of the speech signal.
In certain embodiments of the invention, the method further includes resuming transmission of the speech signal upon detection of both the change in the frequency spectrum of the speech signal and the change in the energy level of the speech signal. The method further includes re-synthesizing a number of speech samples using a relatively recently sampled number of speech samples. The relatively recently sampled number of speech samples are extracted from the speech signal. The method further includes resuming transmission of the speech signal at predetermined time intervals. If desired, the change in the frequency spectrum of the speech signal is determined by comparing a predetermined threshold, and the change in the energy level of the speech signal is determined by comparing a predetermined threshold.
Other aspects, advantages and novel features of the present invention will become apparent from the following detailed description of the invention when considered in conjunction with the accompanying drawings.