A number of different video coding standards have been established for coding digital video sequences. The Moving Picture Experts Group (MPEG), for example, has developed a number of standards including MPEG-1 (Part 2), MPEG-2 (Part 2) and MPEG-4 (Part 2). Other examples include the International Telecommunication Union (ITU-T) H.261 and H.263 standards, and the ITU-T H.264 standard, which is also set forth in MPEG-4 Part 10, entitled “Advanced Video Coding, AVC.” These video coding standards generally support improved transmission and storage efficiency of video sequences by coding data in a compressed manner. Compression reduces the overall amount of data that needs to be transmitted or stored for effective transmission or storage of video frames. Video coding is used in many contexts, including video streaming, video camcorder, personal video recorder (PVR), digital video recorder (DVR), video telephony (VT), video conferencing, digital video distribution on video CD (VCD) and digital versatile/video disc (DVD), and video broadcast applications, over both wired and wireless transmission media and on both magnetic and optical storage media.
The MPEG1, MPEG-2, MPEG-4, ITU-T H.261, ITU-T H.263, and ITU-T H.264 standards support video coding techniques that utilize similarities between successive video frames, referred to as temporal or inter-frame correlation, to provide inter-frame compression. These standards also support video coding techniques that utilize similarities within individual video frames, referred to as spatial or intra-frame correlation, to provide intra-frame compression. The inter-frame compression techniques exploit data redundancy across adjacent or closely spaced video frames by converting pixel-based representations of video frames to pixel-block-based translational motion representations. Video frames coded using inter-frame techniques are often referred to as P (“predicted”) frames or B (“bi-predictive”) frames. Some frames, commonly referred to as I (“intra”) frames, are coded using spatial compression, which can be either non-predictive (i.e., based only on transform coding as in pre-H.264 standards) or predictive (i.e., based on both spatial prediction and transform coding as in H.264). In addition, some frames may include a combination of both intra- and inter-coded blocks. These encoding standards provide highly efficient coding that is well suited to wireless video broadcasting applications.
Prior to performing encoding using any of the efficient encoding standards mentioned above, a coding device may partition a received video sequence into group of pictures (GOP) structures that include a plurality of frames. The coding device may then determine the picture coding type for each of the frames included in the GOP structures before encoding the video data for transmission or storage. Determination of the GOP structure with picture coding types is important for coding efficiency. Therefore, not only encoding schemes that act on previously uncompressed raw video data benefit from GOP structure determination. Transcoding schemes that act on previously compressed video data may also benefit. For example, some video data desired for wireless video broadcasting, e.g., digital television signals, are, in their original form, coded using video encoding standards such as MPEG-2 (Part 2) that do not provide the currently most efficient compression. In this case, a transcoder may convert the video data to an encoding standard that does provide more efficient compression, such as ITU-T H.264, for wireless video broadcasting. In order to convert the video data, a transcoder may first decode the video data from the first encoding standard and then may partition the video sequence into GOP structures and perform GOP structure determination before re-encoding the video data using the second encoding standard more desirable for wireless video broadcasting.
As the video signal changes its statistical nature over time, the coding device should adapt the GOP structure in order to exploit the available temporal redundancy to the fullest extent possible for the most efficient compression. In general, a coding device adaptively determines the picture coding type for a candidate frame within a GOP structure based on the content of surrounding frames and identification of video transitional effects, such as cut scene changes, flash frames, cross-fades, and camera pans and scrolls. Existing adaptive GOP (AGOP) structure determination methods include analysis of statistical features of both luminance and chrominance signals using histograms or variance measures, edge determination based algorithms, and algorithms based on motion vector field evolution or temporal prediction efficiency metrics. However, existing AGOP structure determination methods may not be accurate enough to achieve the efficient compression needed for increasingly complex wireless video broadcasting applications.