The demand for digital video products continues to increase. Some examples of applications for digital video include video communication, security and surveillance, industrial automation, and entertainment (e.g., DV, HDTV, satellite TV, set-top boxes, Internet video streaming, digital cameras, video jukeboxes, high-end displays and personal video recorders). In addition, new applications are in design or early deployment. Further, video applications are becoming increasingly mobile and converged as a result of higher computation power in handsets, advances in battery technology, and high-speed wireless connectivity.
Video compression/decompression for transmission and storage of digital video is an essential enabler for digital video products. MPEG-4, an ISO/IEC standard developed by the Moving Picture Experts Group (MPEG), is one industry standard for video compression used in many digital video products. One part of the MPEG-4 standard, ISO/IEC 14496-2 entitled “Generic Coding of Audio-Visual Objects. Part 2: Visual” (MPEG-4 Visual) specifically defines video compression. MPEG-4 Visual includes the concepts of a video object and a video object plane. A video object is an entity in a scene that a user can access and manipulate. The instances of video objects at a given time are called video object planes (VOPs). The encoding process generates a coded representation of a VOP plus composition information necessary for display. At the decoder, a user may interact with and modify the composition process as needed.
In MPEG-4 Visual, a video object may be an entire frame or a portion of a frame and may be coded as an arbitrary shape. An MPEG-4 bitstream may include three major types of VOPs, intracoded VOPs (I-VOPs), predictive coded VOPs (P-VOPS), and bi-directionally coded VOPs (B-VOPs). I-VOPs are coded with moderate compression and without reference to other pictures. I-VOPs must appear regularly in the bitstream as they are required for decoding of subsequent VOPs. P-VOPs are coded more efficiently, i.e., using motion compensated prediction from past intra or predictive coded VOPs, and are generally used as a reference for further prediction. B-VOPs provide the highest degree of compression but require both past and future reference VOPs for motion compensation. All VOPs consist of macroblocks. A macroblock is 16×16 pixels in the luminance space and 8×8 pixels in the chrominance space for the simplest sub-sampling format.
To aid in error localization, MPEG-4 Visual includes a video packet mode in which frames of video are encoded as video packets that each start with a unique resynchronization marker, referred to in the standard as a resync_marker. When in video packet mode, an encoder inserts resynchronization markers in the video bitstream at approximately fixed intervals. The locations of the resynchronization markers are based on the number of macroblocks required to form a packet with a desired bit length. This allows more resynchronization markers to be inserted in high activity portions of a video bitstream to help a decoder in localizing and recovering from errors. In general, as a decoder receives the encoded bitstream, the decoder attempt to detect any errors in the packets. When an error is detected, the decoder looks forward in the bitstream for the next resynchronization marker, which indicates where the next video packet in the bitstream begins. The decoder is then able to begin decoding this next video packet. Accordingly, the detection of these resynchronization markers in the decoder is important.
Resynchronization markers in the bitstream can vary in length. In general, a resynchronization marker is a byte-aligned binary number having 16-22 zeros followed by a one. More specifically, the length of a resynchronization marker is determined by the type of the VOP in which the resynchronization marker occurs. If the VOP is an I-VOP, the resynchronization markers in the I-VOP are 17 bits in length (16 zeros followed by a one). If the VOP is a P-VOP, the length of the resynchronization markers in the P-VOP is 16 plus the value of an fcode in the P-VOP header, i.e., vop_fcode_forward. An fcode is a 3-bit unsigned integer taking values from 1 to 7 and specifies a motion-vector search range for a VOP. For example, if the value of the fcode is 2, the resynchronization markers in the P-VOP are 18 bits in length (17 zeros followed by a one).
If the VOP is a B-VOP, the length of the resynchronization markers in the B-VOP may vary based on the version of MPEG-4 Visual used by the encoder in coding the bitstream. If version one of MPEG-4 Visual, released in 1999, is used by the encoder, the length of the resynchronization markers in the B-VOP is 16 plus the value of the larger of two fcodes in the B-VOP header, i.e., vop_fcode_forward and vop_fcode_backward. For example, if the value of vop_fcode_forward is 1 and the value of vop_fcode_backward is 3, the resynchronization markers in the B-VOP are 19 bits in length (18 zeros followed by a one). Note that the minimum length of a resynchronization marker in a B-VOP using this definition is 17 and the maximum length is 23.
If either version two or version three of MPEG-4 Visual, released in 2001 and 2004 respectively, is used by the encoder, the length of the resynchronization markers in the B-VOP is 18 if the value of the larger of the two fcodes is 1 or 2, and is 16 plus the value of the larger of the two fcodes otherwise. For example, if the value of vop_fcode_forward is 1 and the value of vop_fcode_backward is 1, the resynchronization markers in the B-VOP are 18 bits in length (17 zeros followed by a one). And, if the value of vop_fcode_forward is 4 and the value of vop_fcode_backward is 3, the resynchronization markers in the B-VOP are 20 bits in length (19 zeros followed by a one). Note that the minimum length of a resynchronization marker in a B-VOP using this definition is 18 and the maximum length is 23.
In commercial applications, it is desirable for a decoder to be able to decode bitstreams encoded using any of the three versions of MPEG-4 Visual. A known solution for the issue of handling the two definitions for the resynchronization marker length for a B-VOP among the versions is to implement separate functions for resynchronization marker detection in a decoder to handle each of the two definitions and provide a flag that is set prior to submitting a bitstream to the decoder to indicate which version of MPEG-4 Visual was used in coding the bitstream. The decoder then checks the flag and selects the appropriate resynchronization marker detection function during decoding.