With the advent of the Internet age, streaming high-fidelity audio has become a reality. It is thus natural to extend audio streaming to wireless communications so that mobile users can listen to music from handheld devices. With the emerging of 2.5G (GPRS) and the third generation (3G) (CDMA2000 and WCDMA) wireless technology, streaming high-fidelity audio over wireless channels and networks has also become a reality. Internet Protocol (IP) based architecture is promising to provide the opportunity for next-generation wireless services such as voice, high-speed data, Internet access, audio and video streaming on an all IP networks. However, delivering or streaming high-fidelity audio across wireless IP networks still remains challenging due to a limited varying bandwidth. Scalable audio coding (SAC) can efficiently accommodate the varying bandwidth of wireless IP channels and networks. A scalable audio bitstream typically consists of a base layer plus a number of enhancement layers. It is possible to use only a subset of the layers to decode the audio with lower sampling resolution and/or quality. In streaming applications, several layers in a scalable audio bitstream are selectively delivered to adapt to network bandwidth fluctuation and packet loss level. For example, when the available bandwidth is low or the packet loss ratio is high, only the base layer is transmitted.
Delivering or streaming high-fidelity audio over wireless IP channels and networks is also challenging because the wireless IP channels and networks present not only packet erasures errors caused by large-scale path loss and fading, but also random bit errors due to the wireless connection. These bit errors have an adverse effect on decompressing the received audio bitstream and can cause the decoder to be come inoperative (e.g. the decoder will crash). To combat these bit errors, forward error correction (FEC) can be used to protect the compressed data. However, no matter how carefully the compressed data are protected before transmission, the received data may still have bit errors.
Considering the limited bandwidth in wireless IP channels and networks, efficient compression techniques can be applied to audio signals but there will be a lessening in sensitivity to transmission errors. To cope with bit errors on wireless IP channels and networks, conventional error resilience (ER) techniques can be used. Error resilience techniques at the source coding level can detect and locate errors, support resynchronization, and prevent the loss of entire data units. With ER techniques, audio quality can be obtained at a bit error rate of about 10−5. The bit error rate in the wireless channel, however, can be significantly higher.
Conventional ER techniques for video coding cannot be directly ported to audio coding because the characteristics of audio and video are different. In video coding there exists a strong correlation between adjacent video frames and this correlation can be exploited to recover data that is corrupted in transmission. In contrast, there is almost no correlation between adjacent audio frames in the time domain. Moreover, audio coding artifacts caused by corrupted frames are esthetically undesirable to human auditory sensibilities.
In the scalable audio codec, the audio signal is first split into individual time segments, which are filtered by a polyphase quadrature filter (PQF) and down-sampled into four subbands to facilitate scalability in sampling resolution. A modified DCT (MDCT) is then performed on each subband and the resulting MDCT coefficients are weighted by a psychoacoustic mask function. Finally, each weighted subband is encoded into an embedded audio bitstream using bit-plane coding, where each bit plane is coded into one layer or data unit (DU). FIG. 1 illustrates the syntax of a conventional scalable audio bitstream for one (1) data unit (DU) of one (1) coded bit-plane. The DU seen in FIG. 1 is formed by a process where each weighted subband of audio data is encoded into an embedded bitstream using bit-plane coding. Each bit plane is coded into one (1) layer or DU. FIG. 1 demonstrates that each DU in the audio bitstream includes strings of significance bit and strings of sign bits. All of the strings of the significance and sign bits precede a string of refinement bits in the DU. The DU can be byte-aligned by the addition of dummy zeros to the end thereof as seen in FIG. 1. In a scalable audio codec, the decoder can quantize the DU in each bit-plane in the embedded audio bitstream to produce quantized data of weighted subbands. The decoder can then dequantize the quantized data of weighted subbands into audio signals.
None of the the sign bits or the refinement bits in the DU are entropy coded. As such, bit errors among the sign and refinement bits will not propagate. In contrast, the significance bits are compressed with variable length codes (VLC). When an error occurs in the portion of the DU that includes the coded significance bits and the coded sign bits, the error will propagate to each of the coded significance bits, the coded sign bits, and the coded refinement bits. The multiplexing of the DUs makes the situation more complex because when the decoder detects an error, the decoder can not identify the exact location of the error. As a result, the whole DU must be discarded, regardless of where the error occurs. Thus, it would be an advance in the art to overcome to develop an ER audio coding technique to reduce error propagation, to reduce error propagation in a DU, and to reduce the discarding of DUs. Consequently, there is a need for improved methods, apparatuses, computer programs, data structures, and systems that can provide such a capability.