1. Technical Field
The present invention relates generally to speech coding using a speech codec; and, more particularly, it relates to silence description coding for multi-rate speech codecs.
2. Description of Prior Art
Conventional speech codec systems that employ silence description coding typically employ some type of voice activity detection algorithm that determines the existence of a substantially speech-like signal contained within a speech signal. When no voice activity is detected in the speech signal, the conventional speech codec utilizes a reduced data transmission rate. In addition, in conventional speech codecs that employ discontinued transmission, operation at a full data transmission rate is performed only when there is an existence of the substantially speech-like signal contained within the speech signal.
A common approach to performing data transmission at the reduced rate, particularly within conventional speech codec systems that operate at multiple data transmission rates, is to employ a fixed reduced rate for each of a multiple data transmission rates. For example, a first reduced data transmission rate accompanies the highest of the multiple data transmission rates. second reduced data transmission rate accompanies the lowest of the multiple data transmission rates. This convention solution of dedicating a separate reduced data transmission rate for each of the multiple data transmission rates results in gross over-allocation of encoder processing resources in the conventional speech codec, in that, more processing circuitry is required to accommodate each of the reduced data transmission rates. Additionally, it creates a computational complexity associated with the need to have a dedicated reduced data transmission rate for each of the multiple data transmission rates.
Another limitation associated with the conventional solution of having a separate reduced data transmission rate for each of the multiple data transmission rates is the intrinsic limitation of bandwidth available within any communication system. Inefficient allocation and management of the available bandwidth in the communication system provides undesirable limitations on the number of communication devices that may be employed at any given time. Additionally, the inefficient use of the available bandwidth precludes efficient use of the remaining bandwidth for other functions not associated exclusively with data transmission. In many conventional speech codec systems, the entire bandwidth spectrum is consumed, and there simply is no available remaining bandwidth in which to perform the other functions.
The traditional solution of detecting the existence of the substantially speech-like signal contained within a speech signal and adjusting the data transmission rate as a function of the substantially speech-like signal typically performs encoding and transmission of all speech segments. The encoding and transmission of all speech segments includes those speech segments that do not contain the substantially speech-like signal. This results in very inefficient allocation of the speech codec's processing resources, in that, every speech segment is encoded even in the absence of the substantially speech-like signal. Operation at the reduced data transmission rate typically involves transmitting a subset of parameters that the speech codec uses to encode the speech signal. The subset of parameters is typically transmitted only when there is a perceptual change in the substantially non-speech-like speech signal.
Other conventional speech codec systems discontinue data transmission altogether in the absence of the substantially speech-like signal. In these conventional speech codec systems, a voice activity detection algorithm is implemented that determines the existence of the substantially speech-like signal and simply discontinues data transmission when it is absent. Such systems suffer from the undesirable perceptual effect of apparent disconnection of the communication link, in that, the silence associated with no data transmission at all gives the listener the impression that no one is on the other end. This undesirable impression of disconnection of the communication link generated from interrupted data transmission greatly reduces the perceptual performance of such conventional speech codec systems. The conventional solution to generate the impression that another individual is on the other end involves performing comfort noise generation. Comfort noise generation is a specific mode of discontinued transmission wherein only a small number of speech parameters are transmitted from an encoder to a decoder in a speech codec, and intermediary values between the small number of speech parameters are generated via interpolation. The entirety of the speech parameters (including the interpolated values) are used to produce a reproduced non-speech signal that is perceptually indistinguishable from background noise. This solution of comfort noise generation provides the perceptual effect of background noise.
Further limitations and disadvantages of conventional and traditional systems will become apparent to one of skill in the art after reviewing the remainder of the present application with reference to the drawings.