Encoder standards efficiently represent video image sequences as compactly coded data. Further, these standards describe decoding (reconstruction) process by which encoded bit streams are mapped from compressed data to the raw video signal data suitable for video display.
The video decoding process is generally the inverse of the video encoding process and is employed to reconstruct a motion picture sequence from a compressed and encoded bitstream. Generally, video bitstream data is decoded according to syntax defined by the encoder standards. The decoder must first identify the beginning of a coded picture, identify the type of picture, and then decode each individual macroblock within a particular picture.
Generally, encoded video data is received in a rate or a video buffer verifier (VBV). The data is retrieved from the channel buffer by a decoder or reconstruction device for performing the decoding. Decoders, such as MPEG decoder, perform inverse scanning to remove any zig zag ordering and inverse quantization to de-quantize the data. Where frame or field DCTs are involved, decoding process utilizes frame and field Inverse Discrete Cosine Transforms (IDCTs) to decode the respective frame and field DCTs, and converts the encoded video signal from the frequency domain to the time domain to produce reconstructed raw video signal data.
Decoder also performs motion compensation using transmitted motion vectors to reconstruct temporally compressed pictures. Decoder examines motion vector data, determines the respective reference block in the reference picture, and accesses the reference block from the frame buffer. After the decoder has Huffman decoded all the macroblocks, the resultant coefficient data is then inverse quantized and operated on by an IDCT process to transform macroblock data from a frequency domain to data in space domain. Frames may need to be re-ordered before they are displayed in accordance with their display order instead of their coding order. After the frames are re-ordered, they may then be displayed on an appropriate device.
FIG. 1 shows a block diagram of a typical video decoding system, as is known in the art. Shown in the figure are an Input Compressed Bit-stream 10, a header decoder 11, a Huffman decoder 12, a Quantizer & Compensator Block 13 and an output buffer 14. The header decoder 11 receives a compressed bit-stream 10 that includes video and audio data. The data elements received from the output of header decoder 11 are Huffman decoded i.e. variable length decoding (VLD/RVLD) is performed and are reordered to produce a set of quantized coefficients. Although Huffman decoder 12 is used, however, other decoders that perform variable length decoding can be used. The variable length decoded data is rescaled/de-quantized (Q−1), inverse transformed (IDCT) and motion compensated by Quantizer & Compensator Block 13. The motion compensation is performed using the header information decoded from the bit-stream (and the decoded motion vectors), to produce an output frame. In alternative embodiments, variation of the aforesaid video decoder can be used. Further in alternate embodiments, the Quantizer & Compensator Block 13 can be split as different blocks.
The Huffman coded data i.e. Reversible Variable Length Codes (RVLC) are designed such that they can be instantaneously decoded both in forward and reverse directions. A part of a bit-stream, which cannot be decoded in the forward direction due to the presence of errors, can often be decoded in the backward direction. This is illustrated in the FIG. 2. Therefore number of discarded bits can be reduced, enabling an improvement in quality. RVLC is applied only to Texture Information.
Initially decoding in the forward direction is performed. If no errors are detected, the bit-stream is assumed to be valid and the decoding process is finished for that Video Packet (VP). If an error is detected, two-way decoding is applied. If an error is detected in the forward decoding, the decoder resynchronizes at the next suitable resynchronization point (vop_start_code or resync_marker) and starts Huffman-decoding in the backward direction, till it gets an error point in the backward direction. With the help of information gathered in the forward and backward direction, decoder adopts one of the four strategies (described in the following part of the description) to determine the macroblocks to be decoded/discarded.
The following strategies for determining which bits (and hence MBs) to discard are used. These strategies are described using the figures along with the following definitions:
An error-point is detected if    (1) An illegal RVLC is found where an illegal RVLC is defined as follows:            A codeword whose pattern is not listed in the RVLC table (e.g. 169 codeword patterns and escape codes).        Escape coding is used (i.e., a legal codeword is not available in the RVLC table) and the decoded value for LEVEL is zero.        The second escape code is incorrect (e.g. codeword is not “00000” or “00001” for forward decoding, and/or is not “00001” for backward decoding).        There is a decoded value of FLC part using escape codes (e.g. LAST, RUN, LEVEL) in the RVLC table.        An incorrect number of stuffing bits for byte alignment (e.g. eight or more 1s follow 0 at the last part of a Video packet (VP), or the remaining bit pattern is not “0111 . . . ” after decoding process is finished).            (2) More than 64 DCT coefficients are decoded in a block.Strategy-1: L1+L2<L and N1+N2<N
Macroblocks (MBs) of f_mb (L1−T) from the beginning and MBs of b_mb(L2−T) from the end are decoded while the MBs of the darker portion as shown in FIG. 3 are discarded.
Strategy-2: L1+L2<L and N1+N2>=N
MBs of (N−N2−1) from the beginning and MBs of (N−N1−1) from the end are decoded while the MBs of the darker portion as shown in FIG. 4 are discarded.
Strategy-3: L1+L2>=L and N1+N2<N
MBs of N−b_mb(L2) from the beginning and MBs of N−f_mb(L1) from the end are decoded while the MBs of the darker portion as shown in FIG. 5 are discarded.
Strategy-4: L1+L2>=L and N1+N2>=N
MBs of min {(N−b_mb(L2)), (N−N2−1)} from the beginning and MBs of min{(N−f_mb(L1)), (N−N1−1)} from the end are decoded while the MBs of the darker portion as shown in FIG. 6 are discarded.
where
    L—Total number of bits for DCT coefficients part in a VP.    N—Total number of macroblocks (MBs) in a VP.    L1—Number of bits which can be decoded in a forward decoding.    L2—Number of bits which can be decoded in a backward decoding.    N1—Number of MBs which can be completely decoded in a forward decoding.(0<=N1<=(N−1))    N2—Number of MBs which can be completely decoded in a backward decoding.(0<=N2<=(N−1))    f_mb(S)—Number of decoded MBs when S bits can be decoded in a forward direction. (Equal to or more than one bit can be decoded in a MB, f_mb(S) counter is up.)    b_mb(S)—Number of decoded MBs when S bits can be decoded in a backward direction.    T—Threshold (90 bits is used now).
When RVLD is performed in the reverse direction, Huffman-decoded information is kept in a buffer and the corresponding MB index in a separate buffer. When the Huffman decoding in the reverse direction ends, then the possible “strategy” is finalized based on the decoded information. Once the “Strategy” is finalized, decoding (IDCT, motion-compensation, etc) is done on Huffman decoded code words.
The following abbreviations stand for    MB_NUMBER: Total number of 16×16 MBs in the frame    rvld_codes_backward [MB_NUMBER] [6] [64]: Array used for storing RVLD output    rvld_index_backward[MB_NUMBER] [6]: Array used for storing the index of last non zero DCT coefficient in the 8×8 block.
Total size required for the two buffers is 75.4 KB for QCIF resolution and 301.6 KB for CIF resolution.
The problem with the existing approach is now illustrated with the help of the following example. Assuming error occurs in (N1+1) MB while forward decoding, backward decoding is done from the end of the video packet. While backward decoding, RVLD output is stored in rvld_codes_backward array for every MB, till the error occurs at (N−N2−1) MB. The index of the last non-zero DCT coefficient is stored in rvld_index_backward for every MB. The MBs to be concealed are determined. Rest of the MBs are considered correct and undergo IDCT, motion compensation (for predicted or P blocks) and are finally written to the output buffer.
RVLD output and index of last non-zero coefficient for all the completely decoded MBs have to be stored in the array. Since number of MBs in a video packet can be as large as MB_NUMBER, the memory has to be allocated for all of them.
In the proposed approach complete decoding of a MB (RVLD, then Inverse Discrete Cosine Transform (IDCT), Motion-compensation etc) is performed in the reverse direction and the decoded data is kept in the output buffer. At the end of decoding, the possible “strategy” is finalized based on the decoded information. Once the “Strategy” is finalized, number of MBs to be concealed is determined.
The challenge in this proposed approach is to determine the point up to which complete-decoding in backward direction is to be performed because the strategy for identifying backward decodable point can only be finalized once the error-point is identified in the backward direction. If decoding is done till error point is reached, lot of irrelevant computation will be done causing cycle-count to shoot-up. Also backward decoding should not be terminated before the decodable point is reached as it will cause loss of relevant data and the output will not be in conformance with the video encoder standard.