Referring to FIG. 1, a conventional group-of-pictures (GOP) structure 10 is shown. The GOP structure 10 is illustrated in capture order and encode (or transmission) order. The GOP 10 includes I pictures (or frames), P pictures (or frames) and B pictures (or frames). The I-frames (i.e., pictures 0, 15 and 30) are intra coded reference pictures. The P-frames (i.e., pictures 3, 6, 9, 12, etc.) and the B-frames (i.e., pictures 1, 2, 4, 5, etc.) are inter coded pictures. Conventional P-frames are predicted from previous (in time) I or P pictures. For example, pictures P3, P6 and P9 are predicted from pictures I0, P3 and P6, respectively. Conventional B-frames are predicted from previous and future (or subsequent in time) I or P pictures. For example, pictures B1 and B2 are predicted from pictures I0 and P3 and Pictures B4 and B5 are predicted from pictures P3 and P6.
Conventional video compression systems that use inter-frame compression can achieve much higher compression ratios than methods that rely only on intra compression. Examples of inter-frame compression include MPEG-2, MPEG-4 and H.264. In order to provide random access when the inter-frame coding is used, intra coded access points (or I-frames) are periodically placed at a rate of one or two I-frames per second.
One type of artifact introduced by the use of intra coded access points in inter-frame compression is “intra beating”. Specifically, the compression artifacts in all or part of a scene can visibly change at each I-frame. A visible beat at the I-frame frequency (typically every one-half to one second) is created. Such artifacts are particularly noticeable in smooth and slow-moving parts of a scene, where no prediction difference is encoded (i.e., the pixels only change at I-frames).
Referring to FIG. 2, a diagram is shown illustrating a hierarchical GOP structure 20. One way to reduce the amount of intra beating is to use a so-called hierarchical GOP structure. When a hierarchical GOP structure with L levels is used, the GOP length is 2L-1. For example, a 5 layer GOP structure would have a GOP length of 16. When a hierarchical GOP structure with L levels is used, each 2L-1 picture is a level 0 (L0) picture. Each level 0 picture is coded as an intra (I) picture. For example, pictures 0, 16 and 32 in FIG. 2 are level 0 (I) pictures.
Pictures midway between the L0 (I) pictures are level 1 (L1) pictures. Level 1 pictures are coded as B pictures using the closest previous and future L0 pictures as references (e.g., picture 8 is a B picture that uses pictures 0 and 16 as references, picture 24 is a B picture that uses pictures 16 and 32 as references). Pictures midway between pictures of level 1 or lower are level 2 (L2) pictures. Level 2 pictures are coded as B pictures using the closest previous and future level 1 or lower pictures as references (e.g., picture 4 is a B picture that uses pictures 0 and 8 as references, picture 12 is a B picture that uses pictures 8 and 16 as references, etc.).
Pictures midway between pictures of level 2 or lower are level 3 (L3) pictures. Level 3 pictures are coded as B pictures using the closest previous and future level 2 or lower pictures as references (e.g., picture 2 is a B picture that uses pictures 0 and 4 as references, picture 6 is a B picture that uses pictures 4 and 8 as references, etc.).
The bifurcation of pictures continues until the highest level (i.e., the odd pictures) is reached. In the example in FIG. 2, the highest level is level 4. Level 4 pictures are all the odd pictures. Each level 4 picture is an odd picture which uses the previous and next pictures, which are from level 3 or lower, as references.
Using hierarchical GOP structures can improve quality. However, hierarchical GOP structures have some problems. One problem is the amount of memory needed for decoding hierarchical GOP structures. As the GOP length increases, the number of frames that the decoder must store increases. Specifically, for an L-layer GOP structure, the decoder needs to hold at least L+1 frames at the same time. For example, referring to FIG. 2, when picture 1 is decoded, the decoder must hold at least pictures 0, 16, 8, 4, 2, and 1. H.264 limits the number of pictures that the decoder needs to hold. The exact limit depends on the “level” of the bit stream as well as the picture resolution, but in many important cases (i.e., when the picture resolution is close to the maximum resolution supported by the level) the limit is 4 pictures. For example, a level 4 bit stream can have pictures no bigger than about 1920×1088. When the picture size is 1920×1088, the maximum number of pictures at the decoder is limited to 4. Another problem of the hierarchical GOP structure is that for scenes with complex motion, the level 1 (L1) pictures are so far away from the respective predictors (e.g., L0 (intra) pictures) that the compression and appearance can be degraded.
It would be desirable to implement a video compression system that reduces intra beating.