In the International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) Moving Picture Experts Group-4 (MPEG-4) Part 10 Advanced Video Coding (AVC) standard/International Telecommunication Union, Telecommunication Sector (ITU-T) H.264 recommendation (hereinafter the “MPEG-4 AVC standard”), the syntax element frame_num is used as an identifier for pictures and has several constraints as defined in the MPEG-4 AVC standard. The primary purpose of frame_num is to act as a counter that increments each time a picture is decoded so that if there are losses of data, the decoder can detect that some picture(s) were missing and would be able to conceal the problem. frame_num increases in decoding order of access units and does not necessarily indicate display order. The Memory Management Control Operations (MMCO) use the value of frame_num to mark pictures as long term and short term references; or mark reference pictures as unused for reference pictures. frame_num is also used for the default reference list ordering for P and SP slices.
The Picture Order Count in the MPEG-4 AVC standard is an indication of the timing or output ordering of a particular picture. Picture order count is a variable having a value that is non-decreasing with increasing picture position in output order relative to the previous Instantaneous Decoding Refresh (IDR) picture in decoding order or relative to the previous picture containing the memory management control operation that marks all reference pictures as “unused for reference”. Picture Order Count is derived from slice header syntax elements. Picture Order Count is used in the derivation of motion vectors in temporal DIRECT mode, implicit weighted prediction, and default initial reference picture list ordering for B slices.
In particular, DIRECT mode motion parameters using temporal correlation are typically derived for the current macroblock/block by considering the motion information within a co-located position in a subsequent reference picture or more precisely the first List 1 reference. Turning to FIG. 1, a diagram illustrating temporal DIRECT prediction in B slice coding is indicated generally by the reference numeral 100. Following the presumption that an object is moving with constant speed these parameters are scaled according to the temporal distances (as shown in FIG. 1) of the reference pictures involved. The motion vectors {right arrow over (MV)}L0 and {right arrow over (MV)}L1 for a DIRECT coded block versus the motion vector MV of its co-located position in the first List 1 reference are calculated as follows:X=(16384+abs(TDD/2))/TDD  (1)ScaleFactor=clip(−1024,1023,(TDB×X+32)>>6)  (2){right arrow over (MV)}L0=(ScaleFactor×{right arrow over (MV)}+128)>>8  (3){right arrow over (MV)}L1={right arrow over (MV)}L0−{right arrow over (MV)}  (4)
In the preceding equations, TDB and TDD are the temporal distances, or more precisely Picture Order Count (POC) distances, of the reference picture used by the List 0 motion vector of the co-located block in the List 1 picture compared to the current and the List 1 picture, respectively. The List 1 reference picture and the reference in List 0 referred by the motion vectors of the co-located block in List 1 are used as the two references of DIRECT mode. If the reference index refIdxL0 refers to a long-term reference picture, or DiffPicOrdernt (pic1, pic0) is equal to 0, the motion vectors {right arrow over (MV)}L0 and {right arrow over (MV)}L1 for the direct mode partition are derived by the following:{right arrow over (MV)}L0=mv of the collocated macroblock{right arrow over (MV)}L1=0
The implicit weighted prediction tool also uses Picture Order Count information to determine the weights. In weighted prediction (WP) implicit mode, weighting factors are not explicitly transmitted in the slice header, but instead are derived based on relative distances between the current picture and the reference pictures. Implicit mode is used only for bi-predictively coded macroblocks and macroblock partitions in B slices, including those using DIRECT mode. For implicit mode the formula shown in Equation (1) is used, except that the offset values O0 and O1 are equal to zero, and the weighting factors W0 and W1 are derived using the formulas below in Equation (6) to Equation (10).predPartC[x,y]=Clip1C(((predPartL0C[x,y]*w0+predPartL1C[x,y]*w1+2 log WD)>>(log WD+1))+((o0+o1+1)>>1))  (5)X=(16384+(TDD>>1))/TDD  (6)Z=clip3(−1024,1023,(TDB·X+32)>>6)  (7)W1=Z>>2  (8)W0=64−W1  (9)
This is a division-free, 16-bit safe operation implementation of the following:W1=(64·TDD)/TDB  (10)DiffPicOrderCnt(picA,picB)=PicOrderCnt(picA)−PicOrderCnt(picB)  (11)where TDB is temporal difference between the List 1 reference picture and the List 0 reference picture, clipped to the range [−128, 127] and TDB is the difference of the current picture and the List 0 reference picture, clipped to the range [−128, 127]. In Multi-view Video Coding, there can be cases where TDD can evaluate to zero (this happens when DiffPicOrderCnt(pic1, pic2) in Equation (11) becomes zero). In such a case, the weights W0 and W1 are set to 32.
In the current MPEG-4 AVC compliant implementation of Multi-view Video Coding (MVC), the reference software achieves multi-view prediction by interleaving all video sequences into a single stream. In this way, frame_num and Picture Order Count between views are coupled together. This has several disadvantages. One disadvantage is there will be gaps in the value of frame_num for partial decoding. This may complicate the management of reference picture lists or make error loss detection based on frame_num gap impossible. Another disadvantage is Picture Order Count does not have a real physical meaning, which can break any coding tool which relies upon Picture Order Count information, such as temporal DIRECT mode or implicit weighed prediction. Yet another disadvantage is that the coupling makes parallel coding of multi-view sequences more difficult.