A standard for digital video and audio programs for broadcast and for recordings such as video compact disks (VCD) has been established by the Motion Pictures Expert Group (MPEG) chartered by the International Organization for Standardization (ISO). Such standards for digital video and two channel stereo audio were established and known as MPEG-1, more formally, as ISO-11172. An enhanced standard, known colloquially as MPEG-2 and more formally as ISO-13818, has been established to provide for enhanced quality and for specifying data formats for broadcast and other higher noise applications as well as digital video disks (DVD) and other higher resolution recorded media.
The MPEG video standard specifies a bitstream syntax that typically provides transformation blocks of 8×8 luminance pels (pixels) and corresponding chrominance data using Discrete Cosine Transform (DCT) coding. The DCT coding is performed on the 8×8 pel blocks followed by quantization, zigzag scan, and variable length coding of runs of zero quantized indices and amplitudes of the indices. Motion compensated prediction is employed. For video, MPEG contemplates Intra (I) frames, Predictive (P) frames and Bidirectionally Predictive (B) frames. The I-frames are independently coded and are the least efficiently coded of the three frame types. P-frames are coded more efficiently than are I-frames and are coded relative to the previously coded I- or P frame. B-frames are coded the most efficiently of the three frame types and are coded relative to both the previous and the next I- or P-frames. The coding order of the frames in an MPEG program is not necessarily the same as the presentation order of the frames. Headers in the bitstream provide information to be used by decoders to properly decode the time and sequence of the frames for the presentation of a moving picture.
The video bitstreams in MPEG systems include a Video Sequence Header, which is the primary definition of the entire video sequence. The Video Sequence Header contains picture size and aspect ratio data, bit rate limits and other global parameters. In MPEG-2, various Sequence Extensions may also be included that contain other information applicable to all pictures of the sequence, including a Progressive Sequence bit which indicates that the sequence contains only Progressive Frame pictures, a Chrominance Format code, information indicating the frame rate at which the original picture was encoded including original video format (e.g., NTSC, PAL, other) and other variables. Following the Video Sequence Header and Sequence Extension are coded Groups-Of-Pictures (GOPs), which are the components of the sequence that enable random access of the video stream. Each GOP usually includes only one I-picture and a variable number of P- and B-pictures. Each GOP also includes a GOP header that contains presentation delay requirements and other data relevant to the entire GOP. Each picture in the GOP includes a Picture Header, which is the primary coding unit that contains picture type and display order and delay data and other information relevant to the picture, including whether the picture is an I-, P- or B-picture, whether the picture is a frame or a field picture, whether a frame picture is a progressive frame or interlaced video, whether the field is to be repeated (3:2 pull-down as described below), field display order and other parameters.
Each MPEG picture is divided into a plurality of Macroblocks (MBs), not all of which need be transmitted. Each MB is made up of 16×16 luminance pels, or a 2×2 array of four 8×8 transformed blocks of pels. MBs are coded in Slices of consecutive variable length strings of MBs, running left to right across a picture. In MPEG-2, slices may begin and end at any intermediate MB position of the picture but must respectively begin or end whenever a left or right margin of the picture is encountered. Each Slice begins with a Slice Header that contains information of the vertical position of the Slice within the picture, information of the quantization scale of the Slice and other information such as that which can be used for fast-forward, fast reverse, resynchronization in the event of transmission error, or other picture presentation purposes. The Slice Header primarily facilitates resynchronization, refresh and error recovery.
The Macroblock is the basic unit used for MPEG motion compensation. Each MB contains an MB Header, which, for the first MB of a Slice, contains information of the MB's horizontal position relative to the left edge of the picture, and which, for subsequently transmitted MBs of a Slice, contains an address increment. Not all of the consecutive MBs of a Slice are transmitted with the Slice. The MB Header identifies the macroblock type, such as Intrafield predictive which is restricted to only pels from the current frame, or Interfield predictive which allows copying of pels from a previous frame. The MB header also defines Motion Vector Type, DCT_type (frame or field DCT), the motion vectors, the blocks that are encoded and macroblock parameters. The individual 8×8 pel blocks, four of which make up the macroblock, have no headers and are the basic transform and compression unit.
The presentation of MPEG video involves the display of video frames at a rate of, for example, twenty-five or thirty frames per second (depending on the national standard used, PAL or NTSC, for example). Thirty frames per second corresponds to presentation time intervals of approximately 32 milliseconds. The capacity of MPEG signals to carry information is achieved in part by exploiting the concept that there is typically a high degree of correlation between adjacent pictures and by exploiting temporal redundancies in the coding of the signals. Where two consecutive video frames of a program are nearly identical, for example, the communication of the consecutive frames requires, for example, only the transmission of one I-picture along with the transmission of a P-picture containing only the information that differs from the I-picture, or Reference Picture, along with information needed by the decoder at the receiver to reconstruct the P-picture from the previous I-picture. This means that the decoder must have provision for storage of the Reference Picture data.
Information contained in a P-picture transmission includes blocks of video data not contained in a Reference I- or P-picture, as well as data information needed to copy data into the current picture from a previously transmitted I- or P-picture. The technique used in MPEG systems to accomplish P-picture construction from a Reference picture is the technique of Forward Prediction in which a Prediction in the form of a Prediction Motion Vector (MV) is transmitted in lieu of the video data of a given or Target MB. The MV tells the decoder where and how to extract a 16×16 block of pixel data from the I- or P- Reference Picture to be reproduced as the Target MB. If needed, a Prediction Error is transmitted in the form of an error block that contains pixel data needed to supplement the copied motion compensated data in order to complete the current picture.
With B-pictures, the Bidirectional Temporal Prediction technique called Motion Compensated Interpolation is used. Motion Compensated Interpolation is accomplished by transmitting, in lieu of all of the video data for a Target MB, an MV that specifies which 16×16 block of pixels to copy either from the previous Reference Picture or from the next future Reference Picture, or from the average of one 16×16 block of pixels from each of the previous and next future Reference Pictures. By “previous” reference picture is meant a reference I- or P-picture that has already been displayed and is used for motion compensation prediction of subsequent pictures that have yet to be displayed. By “future” reference picture is meant a picture that is to be displayed in the future, but which will have been contained in the input signal bitstream and received before the current picture to permit the copying of data from it. With the motion vector, an Error Block of only the data, if any, that cannot be supplied by copying from the referenc pictures is transmitted in pixel data form.
Motion compensation vectors in current MPEG P- and B-pictures specify relocation of pixel data to the nearest half pel. This requires that the MPEG decoders perform a half-pel interpolation of luminance and chrominance values from adjacent pixel data in a 16×16 sized block copied from the reference picture in order to arrive at the luminance and chrominance values for the pixels of the macroblock in the current picture. Typical MPEG video decoders carry out this half-pel interpolation upon the performance of the motion compensation as the current picture is being written to the output buffer. With standard resolution systems, the output macroblocks will have the same number of pixels as the reference macroblocks, so that after the half-pel interpolation, the original copied pixel values will be discarded. The resolution of the resulting current picture typically approaches that of the reference picture, which may be a slightly degraded reproduction of the original picture. The addition of half-pel interpolation to motion compensation of video programs enhances the quality of the output when presented in the original resolution.
Video presentation systems produce rectangular images by scanning horizontal lines, from top to bottom, on a screen. The images are formed of rectangular arrays of pixels, for example, at 720 pixels per scan line, with 480 scan lines per picture under the NTSC standard for the current resolution standard used in the United States and Japan and 576 scan lines per picture under the PAL standard for the current resolution standard used in Europe. Standard definition programs are displayed in two formats. Under the NTSC standard, images are displayed at a rate of 30 pictures per second while under the PAL standard, images are displayed at a rate of 25 pictures per second. Under both standards, each image is displayed as two successive fields, a top field that includes the even lines of a picture and a bottom field that includes the odd lines of a picture. Under NTSC, 60 fields per second are displayed. Under PAL, 50 frames per second are displayed.
Frequently it will be necessary to display a program that is broadcast or recorded under one standard, NTSC or PAL, on a system that is configured to display under the other standard. Such cases require frame rate conversion from one rate, 60 or 50 fields per second, to the other rate. Such conversions are from 6 to 5, or from 5 to 6, frames per second. The modes for such conversion are not specified by MPEG.
In addition, many programs to be displayed on systems of either the NTSC or PAL standards are broadcast or recorded from motion picture film, in full frame images at rates of 24 or 20 frames per second. In such programs, these progressive images are recorded with all of the odd and even scan lines interleaved and encoded by frame. Such programs must undergo a frame rate conversion for display in 30 frames per second NTSC or 25 frames per second PAL frame rates. These conversions can be (1) from 24 frames per second to 25 frames (50 fields) per second, (2) from 20 frames per second to 30 frames (60 fields) per second or (3) from 20 frames per second to 25 frames (50 fields) per second. This produces conversion ratios of from 4 to 5 frames per second, from 24 to 25 frames per second, from 2 to 3 frames per second, and again from 4 to 5 frames per second, respectively.
Digital Video Disc (DVD) recordings include information in the bitstream Picture Headers that specify which frames are to be repeated to convert, for example, the 24 frames per second of a motion picture recording to the 30 frames per second of NTSC video. Other programs such as Video Compact Disc (VCD) recordings do not specify which pictures are to be repeated in a conversion, even though, to play such recordings on a PAL or NTSC system, such conversion must be conducted by the receiver. Furthermore, when recordings are to be converted from PAL to NTSC, or NTSC to PAL, intelligence must be provided in the receiving system to define a repeat scheme that will effectively reproduce the program to the system video output standard.
Furthermore, straightforward conversion systems have had certain minimum buffer memory requirements. In addition, the specified repeat order of DVD programs also requires a minimum amount of buffer memory to implement. The conversion of 24 frame per second progressive frame motion pictures to 30 frame per second NTSC video traditionally employs a conversion scheme referred to as 3:2 pull-down, by which three fields are generated from two fields of a frame of the original picture by displaying one of the fields twice. In the case of a progressive frame encoding of a 24 frame per second motion picture film to NTSC 30 frame per second video, such 3:2 pull-down may include, for example, displaying three fields from the two fields of one received frame and then two fields from the next frame, followed by three from the next then two from the next. The sequence under MPEG is specified to be: top-bottom-top, then bottom-top, then bottom-top-bottom and then top-bottom from four consecutive frames of the original picture to produce five frames of display, that is, by displaying a ten field sequence out of every four frames of original data, for a 24 to 30 frame per second conversion ratio.
Frame rate conversions, particularly those requiring the repetition of frames or fields, affects the design of the receiver. To repeat a field of a picture, either the decoded field must be stored or the field or the same frame must be decoded more than once. This increases the decoder speed requirements, the required buffer memory, or both, affecting the cost and complexity of the receiver.
An additional problem presented by the variety of conversion requirements is the complexity of the video decoding needed to deal with the alternative conversion situations using methods of the prior art.
Furthermore, not all video presentation systems are to be used in applications requiring the high resolution and other capabilities of DVD or other MPEG compliant systems. Accordingly, in order to make the systems of differing performance capabilities available at optimum cost, video decoders and other system components should be capable of functioning in a variety of systems to provide a wide range of capabilities without imposing on all such systems the same memory and performance requirements. Since it is not economical to produce electronic circuits in small quantities for each application, prior art systems typically are not produced to economically serve each of the applications for which they are needed.
For all of these purposes, the need to make frame rate conversions has the propensity to increase the complexity and cost of the video decoding system or the size and cost of the video buffer memory.
There is a need, particularly for video presentation systems with standard resolution video programs, for efficient and reliable frame rate conversion to take place.