1. Field of the Invention
The invention generally relates to systems and methods for improving the quality of an audio signal received within an audio communications system.
2. Background
In audio coding (sometimes called “audio compression”), a coder encodes an input audio signal into a digital bit stream for transmission. A decoder decodes the bit stream into an output audio signal. The combination of the coder and the decoder is called a codec. The transmitted bit stream is usually partitioned into frames, and in packet transmission networks, each transmitted packet may contain one or more frames of a compressed bit stream.
In many audio communications systems, bit errors may be introduced into the bit stream during transmission from the encoder to the decoder. Such bit errors may be random or bursty in nature. Generally speaking, random bit errors have an approximately equal probability of occurring over time, whereas bursty bit errors are more concentrated in time. Some codecs are more resilient to bit errors than others. For example, some codecs, such as CVSD (Continuously Variable Slope Delta Modulation), were designed with bit error resiliency in mind. With CVSD, the quality of the decoded audio output signal degrades gracefully as the occurrence of random bit errors increases.
CVSD is a delta modulation technique with a variable step size that was first proposed by J. A. Greefkes and K. Riemens in “Code Modulation with Digitally Controlled Companding for Speech Transmission,” Philips Tech. Rev., pp. 335-353 (1970), the entirety of which is incorporated by reference herein. CVSD encodes at 1 bit per sample. For example, CVSD can be used to encode audio sampled at 64 kilohertz (kHz) at 64 kilobits/second (kbit/s).
In CVSD, the encoder maintains a reference sample and a step size. Each input sample is compared to the reference sample. If the input sample is larger, the encoder emits a “1” bit and adds the step size to the reference sample. If the input sample is smaller, the encoder emits a “0” bit and subtracts the step size from the reference sample. The CVSD encoder also keeps the previous K bits of output (K=3 or K=4 are very common) to determine adjustments to the step size; if J of the previous K bits are all 1s or 0s (J=3 or J=4 are also common), the step size is increased by a fixed amount. Otherwise, the step size remains the same (although it may be multiplied by a decay factor which is slightly less than 1). The step size is adjusted for every input sample processed.
A CVSD decoder reverses this process, starting with the reference sample, and adding or subtracting the step size according to the bit stream. The sequence of adjusted reference samples constitutes the reconstructed audio waveform, and the step size is increased or maintained in accordance with the same all-1s-or-0s logic as in the CVSD encoder.
In CVSD, the adaptation of the step size helps to minimize the occurrence of slope overload and granular noise. Slope overload occurs when the slope of the audio signal is so steep that the encoder cannot keep up. Adaptation of the step size in CVSD helps to minimize or prevent this effect by enlarging the step size sufficiently. Granular noise occurs when the audio signal is constant. A CVSD system has no symbols to represent steady state, so a constant input is represented by alternate ones and zeros. Accordingly, the effect of granular noise is minimized when the step size is sufficiently small.
CVSD has been referred to as a compromise between simplicity, low bit rate, and quality. Different forms of CVSD are currently used in a variety of applications. For example, a 12 kbit/s version of CVSD is used in the SECURENET® line of digitally encrypted two-way radio products produced by Motorola, Inc. of Schaumburg, Ill. A 16 kbit/s version of CVSD is used by military digital telephones (referred to as Digital Non-Secure Voice Terminals (DNVT) and Digital Secure Voice Terminals (DSVT)) for use in deployed areas to provide voice recognition quality audio. The Bluetooth® specification for wireless personal area networks (PANs) specifies a 64 kbit/s version of CVSD that may be used to encode voice signals in telephony-related Bluetooth™ service profiles, e.g. between mobile phones and wireless headsets.
Although CVSD is robust to random bit errors as noted above, it is not robust to bursty bit errors. Consequently, when processing an encoded bit stream that includes bursty bit errors, a CVSD decoder may produce a decoded audio output signal that includes an audible click. This artifact may be detected and subsequently concealed using a packet loss concealment algorithm or other concealment technique. However, because CVSD is a type of differential waveform coder, the quality of its performance depends on the maintenance of synchronized state information at the encoder and the decoder. Thus, although artifacts resulting from bursty bit errors may be concealed, the processing of the bursty bit errors by the CVSD decoder will result in a divergence between the state information maintained in memory by the CVSD decoder and the state information maintained in memory by the CVSD encoder. This divergence may result in artifacts that will decay over time, but that may linger for several frames beyond the corrupted frame.