1. Field
This invention relates to methods and apparatus for encoding and decoding digital data with error management schemes.
2. Background
The demands of higher data rates and higher quality of service in mobile communication systems are growing rapidly. However, factors such as limited transmit power, limited bandwidth and multi-path fading continue to restrict the data rates handled by practical systems. In multimedia communications, particularly in error-prone environments, error resilience of the transmitted media is critical in providing the desired quality of service because errors in even a single decoded value can lead to decoding artifacts propagating spatially and temporally. Various encoding measures have been used to minimize errors while maintaining a necessary data rate, however all of these techniques suffer from problems with errors arriving at the decoder side.
Hybrid coding standards, such as MPEG-1, MPEG-2, MPEG-4 (collectively referred to as MPEG-x), H.261, H.262, H.263, and H.264 (collectively referred to as H.26x), describe data processing and manipulation techniques (referred to herein as hybrid coding) that are well suited to the compression and delivery of video, audio and other information using fixed or variable length source coding techniques. In particular, the above-referenced standards, and other hybrid coding standards and techniques, compress, illustratively, video information using intra-frame entropy coding techniques (such as, for example, run-length coding, Huffman coding and the like) and inter-frame coding techniques (such as, for example, forward and backward predictive coding, motion compensation and the like). Specifically, in the case of video processing systems, hybrid video coding systems are characterized by prediction-based compression encoding of video frames with intra- and/or inter-frame motion compensation encoding.
Entropy coding enables very efficient lossless representations of symbols generated by random information sources. As such it is an indispensable component of both lossless and lossy data compression schemes. Despite its tremendous benefits to compression efficiency, entropy coding also complicates the decoding process. A common feature of all different approaches to entropy coding is that a single or a sequence of source symbols is associated and represented with a binary pattern, i.e., a sequence of ones and zeros known as a codeword, the length of which increases with decreasing symbol likelihood. Hence more likely symbols are assigned more compact representations, enabling on average a substantial savings over a straightforward symbol alphabet size based fixed-length representation.
The ambiguity around how many bits to consume for the next symbol in a bitstream, i.e. in an entropy coded representation of the output of an information source, is a complication for a decoder. However, much more importantly, in case of errors in the bitstream, the use of variable size codewords in conjunction with flipped bits (due to errors) may frequently result in the emulation of an incorrect codeword length and as a result the parsing/decoding process may lose its synchronization with the bitstream, i.e., correct identification of codeword boundaries and hence correct interpretation of the bitstream may start failing.
Assume a decoder implementing a basic level of error detection measures encounters a problem in decoding a bitstream and loses synchronization. Eventually, due to either a syntax violation, i.e., an invalid codeword, or a semantic failure e.g. invalid parameter values or unexpected bitstream object, the decoder may become aware of the problem and take necessary steps to resynchronize itself with the bitstream. This may induce data loss to an extent much beyond the corruption that triggered the data loss in the first place. The data loss can spread spatially across the frame due to spatial prediction that is used for digital compression. The data loss is also aggravated if the lost data are part of a reference frame for a motion compensated prediction region, causing temporal error propagation.
MPEG-x and H.26x hybrid coding standards typically provide resynchronization points (RSP) at NALU (Network Abstraction Layer Unit) boundaries, the most common NALU being a slice. A slice may be a group of consecutive macroblocks in raster scan order, where a macroblock is made up of 16×16 pixels. Pixels are defined by a luminance value (Y) and two chrominance values (Cr and Cb). In H.264, Y, Cr and Cb components are stored in a 4:2:0 format, where the Cr and Cb components are down-sampled by 2 in the X and the Y directions. Hence, each macroblock would consist of 256 Y components, 64 Cr components and 64 Cb components. H.264 generalized the concept of a slice through introducing slice groups and flexible macroblock ordering, (FMO). Slice groups and FMO allow slice and macroblock association to be totally arbitrary, providing flexibility much beyond the traditional structure of consecutive macroblocks. Slices start with an RSP known as a prefix code. The RSP prefix code is a byte-aligned, reserved bit string code word that is on the order of three bytes long. In order to serve as a true resynchronization point, all inter-coding prediction chains avoid referencing data prior to the RSP. The overhead induced by the prefix code bytes as well as the coding efficiency lost due to the interruption of or degradation in predictive coding chains, are the disadvantages of using slices frequently, i.e., using short slices, which are to be weighed against their advantages in supporting error resilience. Based on these concerns, the entire frame being encoded as a single slice is not uncommon as default encoder behavior. Another popular shorter slice structure is letting each macroblock row constitute a slice. Slices shorter than a macroblock row are far less frequently utilized and when they are adopted, most of the time the reason is to match the slice sizes (in number of bits) to a required transport packet size.
In the traditional slice based resynchronization scheme, if an error in the data are detected at a decoder, such as a semantic or syntactic error for example, the entire slice occurring after the error, is rendered useless. This is not a desirable condition, especially for longer slices such as an entire frame. What is needed is an intra-slice resynchronization point (IS-RSP) that may enable some of the video data contained in a corrupted slice to be salvaged. In addition, the IS-RSPs need to be positioned intelligently to maximize error resilience while reducing overhead.