The present invention relates to video compression, and more particularly to modifying a group of picture structure in MPEG video.
The Motion Picture Engineering Group (MPEG) has established various standards for the compression of television video and audio information. One standard is referred to as MPEG-2. This standard has three different compressed picture types: I, P and B. MPEG-1 permitted a fourth compressed picture type: D for DC-only pictures. The D picture type is available in MPEG-2 only in pure MPEG-1 mode. The I pictures represent a stand alone image, i.e., the pictures are compressed solely with respect to the information within the picture without reference to any other pictures. The P pictures are composed of macroblocks which may be either intra-coded, as in the I pictures, or based on prediction from a previous I or P picture. The B pictures are composed of either intra-coded macroblocks or forward, backward or bi-directionally predicted blocks. The reference pictures for B picture prediction are the closest I or P pictures on either temporal side. These reference pictures are referred to as anchor pictures. When coded, the sequence of pictures is IBBPBBP . . . I, as shown in display order in FIG. 1. In order to make sure that the coded bitstream contains only causal references, the coded picture order for the above display sequence of pictures is IPBBPBB . . . I. Informally a group of pictures (GOP) defines the periodicity of the picture types in the coded bitstream. GOPs are often parameterized by the two numbers M and N, as indicated in FIG. 1. M is the periodicity of P pictures and N is the periodicity of I pictures. For example, as shown M=3 and N=15.
The GOP structure provides tradeoff flexibility for an encoder in terms of compression factor, complexity and latency. For example, MPEG-1 and MPEG-2 video compressors typically take as input a standard definition raw digital video bitstream at up to 286 Mb/s and generate a compressed bitstream with a bit-rate ranging anywhere from 0.5 Mb/s to 50 Mb/s. The compression factor is defined as the ratio between the raw data input rate and the compressed data output rate. Likewise complexity refers to the complexity in hardware/software implementation--gate/transistor count and speed of execution in hardware, and lines of code and number of operations to achieve a task in software. Finally latency for this purpose is end-to-end latency in a video compressor, transmission medium and video decompressor system which indicates the time interval between the instance a frame or picture is captured by a camera and fed to the video compressor and the instance that same frame or picture is decompressed and displayed from the video decompressor. Generally with increased implementation complexity or more latency better compression factors are obtainable. The higher the latency the greater the complexity.
In teleconferencing applications the latency requirements are stringent, and the overall end-to-end latency in a video compressor, transmission and video decompressor system should be only a fraction of a second, such as less than or equal to 250 milliseconds. In these applications it is quite normal to use a GOP structure having mainly P pictures and periodic I pictures to perform a refresh. This mode of operation is usually termed a low-delay mode. A typical GOP structure for low-delay mode in the 60 Hz world is M=1, N=15, and in the 50 Hz world is M=1, N=12. With such a GOP structure the amount of memory needed at the compressor and decompressor is very small. The motion estimation complexity at the encoder also is very small compared to the motion estimation for B pictures. However the compression achievable with such a GOP structure is inferior to the compression achievable with B pictures.
In most other applications the compression factor is of greater importance. In such applications B pictures are used for higher compression. This mode of operation is usually termed a non-low-delay mode. A typical GOP structure for these applications in the 60 Hz world is M=3, N=15, and in the 50 Hz world is M=3, N=12. These GOP structures provide a good compromise between latency, compression factor and complexity.
The compressed video bitstream generated for interactive teleconferencing applications may have to be stored, or archived, for future reviews. To enable this, the compressed video bitstream in the low-delay mode needs to be converted to a non-delay-mode compressed video bitstream, such as by decompressing and recompressing. Another reason for generating the non-delay-mode bitstream from the low-delay mode bitstream may be lack of motion estimation resources at the encoder for full B picture estimation. The straight-forward approach is to decompress the low-delay mode video bitstream into an uncompressed raw video bitstream, and to perform a new compression in non-low-delay-mode. In this approach the motion information available in the low-delay mode bitstream is ignored, or forgotten, and the motion estimation is performed by the encoder on the video bitstream again. This results in wastage of resources.
What is desired is a method of modifying a GOP structure from low-delay mode to non-low-delay-mode that uses the motion vector information present in the low-delay mode bitstream to generate the non-low-delay mode bitstream with a higher compression factor.