1. Field of the Invention
The invention generally relates to systems and methods for improving the quality of an audio signal transmitted within an audio communications system.
2. Background
In audio coding (sometimes called “audio compression”), a coder encodes an input audio signal into a digital bit stream for transmission. A decoder decodes the bit stream into an output audio signal. The combination of the coder and the decoder is called a codec. The transmitted bit stream is usually partitioned into frames, and in packet transmission networks, each transmitted packet may contain one or more frames of a compressed bit stream. In wireless or packet networks, sometimes the transmitted frames or the packets are erased or lost. This condition is often called frame erasure in wireless networks and packet loss in packet networks. Frame erasure and packet loss may result, for example, from corruption of a frame or packet due to bit errors. For example, such bit-errors may prevent proper demodulation of the bit stream or may be detected by a forward error correction (FEC) scheme and the frame or packet discarded.
It is well known that bit errors can occur in most audio communications system. The bit errors may be random or bursty in nature. Generally speaking, random bit errors have an approximately equal probability of occurring over time, whereas bursty bit errors are more concentrated in time. As previously mentioned, bit errors may cause a packet to be discarded. In many conventional audio communications systems, packet loss concealment (PLC) logic is invoked at the decoder to try and conceal the quality-degrading effects of the lost packet, thereby avoiding substantial degradation in output audio quality. However, bit errors may also go undetected and be present in the bit stream during decoding. Some codecs are more resilient to such bit errors than others. Some codecs, such as CVSD (Continuously Variable Slope Delta Modulation), were designed with bit error resiliency in mind, while others, such as A-law or u-law pulse code modulation (PCM) are extremely sensitive to even a single bit error. Model-based codecs such as the CELP (Code Excited Linear Prediction) family of audio coders may have some very sensitive bits (e.g., gain, pitch bits) and some more resilient bits (e.g., excitation).
Today, many wireless audio communications systems and devices are being deployed that operate in accordance with Bluetooth®, an industrial specification for wireless personal area networks (PANs). Bluetooth® provides a protocol for connecting and exchanging information between devices such as mobile phones, laptops, personal computers, printers, and headsets over a secure, globally unlicensed short-range radio frequency.
The original Bluetooth® audio transport mechanism is termed the Synchronous Connection-Oriented (SCO) channel, which supplies full-duplex data with a 64 kbit/s rate in each direction. There are three codecs defined for SCO channels: A-law PCM, u-law PCM, and CVSD. CVSD is used almost exclusively due to its robustness to random bit errors. With CVSD, the audio output quality degrades gracefully as the occurrence of random bit errors increases. However, CVSD is not robust to bursty bit errors, and as a result, annoying “click-like” artifacts may become audible in the audio output when bursty bit errors occur. With other codecs such as PCM or CELP-based codecs, audible clicks may be produced by even a few random bit-errors.
In a wireless communications system such as a Bluetooth® system, bit errors may become bursty under certain interference or low signal-to-noise ratio (SNR) conditions. Low SNR conditions may occur when a transmitter and receiver are at a distance from each other. Low SNR conditions might also occur when an object (such as a body part, desk or wall) impedes the direct path between a transmitter and receiver. Because a Bluetooth® radio operates on the globally available unlicensed 2.4 GHz band, it must share the band with other consumer electronic devices that also might operate in this band including but not limited to WiFi® devices, cordless phones and microwave ovens. Interference from these devices can also cause bit errors in the Bluetooth® transmission.
Bluetooth® defines four packet types for transmitting SCO data—namely, HV1, HV2, HV3, and DV packets. HV1 packets provide ⅓ rate FEC on a data payload size of 10 bytes. HV2 packets provide ⅔ rate FEC on a data payload size of 20 bytes. HV3 packets provide no FEC on a data payload of 30 bytes. DV packets provide no FEC on a data payload of 10 bytes. There is no cyclic redundancy check (CRC) protection on the data in any of the payload types. HV1 packets, while producing better error recovery than other types, accomplish this by consuming the entire bandwidth of a Bluetooth® connection. HV3 packets supply no error detection, but consume only two of every six time slots. Thus, the remaining time slots can be used to establish other connections while maintaining a SCO connection. This is not possible when using HV1 packets for transmitting SCO data. Due to this and other concerns such as power consumption, HV3 packets are most commonly used for transmitting SCO data.
A Bluetooth® packet contains an access code, a header, and a payload. While a ⅓ FEC code and an error-checking code protect the header, low signal strength or local interference may result in a packet being received with an invalid header. In this case, certain conventional Bluetooth® receivers will discard the entire packet and employ some form of PLC to conceal the effects of the lost data. However, with HV3 packets, because only the header is protected, bit errors impacting only the user-data portion of the packet will go undetected and the corrupted data will be passed to the decoder for decoding and playback. As mentioned above, CVSD was designed to be robust to random bit errors but is not robust to bursty bit errors. As a result, annoying “click-like” artifacts may become audible in the audio output when bursty bit errors occur.
Recent versions of the Bluetooth specification (in particular, version 1.2 of the Bluetooth® Core Specification and all subsequent versions thereof) include the option for Extended SCO (eSCO) channels. In theory, eSCO channels eliminate the problem of undetected bit errors in the user-data portion of a packet by supporting the retransmission of lost packets and by providing CRC protection for the user data. However, in practice, it is not that simple. End-to-end delay is a critical component of any two-way audio communications system and this limits the number of retransmissions in eSCO channels to one or two retransmissions. Retransmissions also increase power consumption and will reduce the battery life of a Bluetooth® device. Due to this practical limit on the number of retransmissions, bit errors may still be present in the received packet. The obvious approach is to simply declare a packet loss and employ PLC. However, in most cases, there may only be a few random bit errors present in the data, in which case, better quality may be obtained by allowing the data to be decoded by the decoder as opposed to discarding the whole packet of data and concealing with PLC. As a result, the case of bit-error-induced artifacts must still be handled with eSCO channels.
The detection and concealment of clicks in audio signals is not new. However, most prior art techniques deal exclusively with detecting bit errors in memory-less codecs such as the G.711 codec, or in detecting clicks due to degradation of a storage medium. In these applications, the click is typically very short in duration and can be modeled as an impulse noise. Typical techniques used for detection include LPC inverse filtering, pitch prediction, matched filtering, median filtering, and higher order derivatives. Concealment techniques generally entail some form of sample replacement/smoothing/interpolation. However, the problem is more complex when attempting to detect clicks caused by bit errors in many audio codecs.
For example, CVSD is a memory-based audio codec that operates with a 30 sample frame size within a Bluetooth® system. As a result, the noise shape does not resemble an impulse. The noise pulse differs in at least three very important ways: (1) the noise pulse shape varies from one error frame to the next, (2) the pulse can often consume the entire length of the frame, and (3) due to the memory of CVSD, the distortion can carry into subsequent frames. These differences render the prior art techniques mostly ineffective. For example, matched filtering relies on knowledge of the noise pulse shape which in the prior art is simply an impulse. However, as described above, for CVSD the pulse shape is not known, rendering matched filtering useless. Median filtering requires a long delay and is not practical in a delay constrained two-way audio communications channel. Higher order derivatives are effective when the noise is impulsive, but are not effective when the pulse is of longer durations. LPC inverse filtering and pitch prediction are still applicable, but on their own without the other methods applied, they are not effective enough to provide reliable detection. In addition, prior art concealment techniques do not apply to this application because the distortion may be spread across several samples and potentially impact an entire frame (30 samples) or more. Thus, a more complex concealment algorithm is required.
For applications such as Bluetooth® headsets, the emphasis in design is on extremely low complexity due to the low cost and low power dissipation requirements. Therefore, what is needed is a low complexity bit error concealment algorithm that addresses the challenging requirements and constraints described above.