The present invention relates to digital video presentation, and particularly, to systems and methods for converting frame rates of decoded MPEG video streams for display. More particularly, the present invention relates to the determination and control of frame conversion and field display sequencing in MPEG video receiving and presentation systems, including those of differing configurations.
A standard for digital video and audio programs for broadcast and for recordings such as video compact disks (VCD) has been established by the Motion Pictures Expert Group (MPEG) chartered by the International Organization for Standardization (ISO). Such standards for digital video and two channel stereo audio were established and known as MPEG-1, more formally, as ISO-11172. An enhanced standard, known colloquially as MPEG-2 and more formally as ISO-13818, has been established to provide for enhanced quality and for specifying data formats for broadcast and other higher noise applications as well as digital video disks (DVD) and other higher resolution recorded media.
The MPEG video standard specifies a bitstream syntax that typically provides transformation blocks of 8xc3x978 luminance pels (pixels) and corresponding chrominance data using Discrete Cosine Transform (DCT) coding. The DCT coding is performed on the 8xc3x978 pel blocks followed by quantization, zigzag scan, and variable length coding of runs of zero quantized indices and amplitudes of the indices. Motion compensated prediction is employed. For video, MPEG contemplates Intra (I) frames, Predictive (P) frames and Bidirectionally Predictive (B) frames. The I-frames are independently coded and are the least efficiently coded of the three frame types. P-frames are coded more efficiently than are I-frames and are coded relative to the previously coded I- or P frame. B-frames are coded the most efficiently of the three frame types and are coded relative to both the previous and the next I- or P-frames. The coding order of the frames in an MPEG program is not necessarily the same as the presentation order of the frames. Headers in the bitstream provide information to be used by decoders to properly decode the time and sequence of the frames for the presentation of a moving picture.
The video bitstreams in MPEG systems include a Video Sequence Header, which is the primary definition of the entire video sequence. The Video Sequence Header contains picture size and aspect ratio data, bit rate limits and other global parameters. In MPEG-2, various Sequence Extensions may also be included that contain other information applicable to all pictures of the sequence, including a Progressive Sequence bit which indicates that the sequence contains only Progressive Frame pictures, a Chrominance Format code, information indicating the frame rate of at which the original picture was encoded including original video format (e.g., NTSC, PAL, other) and other variables. Following the Video Sequence Header and Sequence Extension are coded Groups-Of-Pictures (GOPs), which are the components of the sequence that enable random access of the video stream. Each GOP usually includes only one I-picture and a variable number of P- and B-pictures. Each GOP also includes a GOP header that contains presentation delay requirements and other data relevant to the entire GOP. Each picture in the GOP includes a Picture Header, which is the primary coding unit that contains picture type and display order and delay data and other information relevant to the picture, including whether the picture is an I-, P- or B-picture, whether the picture is a frame or a field picture, whether a frame picture is a progressive frame or interlaced video, whether the field is to be repeated (3:2 pull-down as described below), field display order and other parameters.
Each MPEG picture is divided into a plurality of Macroblocks (MBs), not all of which need be transmitted. Each MB is made up of 16xc3x9716 luminance pels, or a 2xc3x972 array of four 8xc3x978 transformed blocks of pels. MBs are coded in Slices of consecutive variable length strings of MBs, running left to right across a picture. In MPEG-2, slices may begin and end at any intermediate MB position of the picture but must respectively begin or end whenever a left or right margin of the picture is encountered. Each Slice begins with a Slice Header that contains information of the vertical position of the Slice within the picture, information of the quantization scale of the Slice and other information such as that which can be used for fast-forward, fast reverse, resynchronization in the event of transmission error, or other picture presentation purposes. The Slice Header primarily facilitates resynchronization, refresh and error recovery.
The Macroblock is the basic unit used for MPEG motion compensation. Each MB contains an MB Header, which, for the first MB of a Slice, contains information of the MB""s horizontal position relative to the left edge of the picture, and which, for subsequently transmitted MBs of a Slice, contains an address increment. Not all of the consecutive MBs of a Slice are transmitted with the Slice. The MB Header identifies the macroblock type, such as Intrafield predictive which is restricted to only pels from the current frame, or Interfield predictive which allows copying of pels from a previous frame. The MB header also defines Motion Vector Type, DCT_type (frame or field DCT), the motion vectors, the blocks that are encoded and macroblock parameters. The individual 8xc3x978 pel blocks, four of which make up the macroblock, have no headers and are the basic transform and compression unit.
The presentation of MPEG video involves the display of video frames at a rate of, for example, twenty-five or thirty frames per second (depending on the national standard used, PAL or NTSC, for example). Thirty frames per second corresponds to presentation time intervals of approximately 32 milliseconds. The capacity of MPEG signals to carry information is achieved in part by exploiting the concept that there is typically a high degree of correlation between adjacent pictures and by exploiting temporal redundancies in the coding of the signals. Where two consecutive video frames of a program are nearly identical, for example, the communication of the consecutive frames requires, for example, only the transmission of one I-picture along with the transmission of a P-picture containing only the information that differs from the I-picture, or Reference Picture, along with information needed by the decoder at the receiver to reconstruct the P-picture from the previous I-picture. This means that the decoder must have provision for storage of the Reference Picture data.
Information contained in a P-picture transmission includes blocks of video data not contained in a Reference I- or P-picture, as well as data information needed to copy data into the current picture from a previously transmitted I- or P-picture. The technique used in MPEG systems to accomplish P-picture construction from a Reference picture is the technique of Forward Prediction in which a Prediction in the form of a Prediction Motion Vector (MV) is transmitted in lieu of the video data of a given or Target MB. The MV tells the decoder where and how to extract a 16xc3x9716 block of pixel data from the I- or P-Reference Picture to be reproduced as the Target MB. If needed, a Prediction Error is transmitted in the form of an error block that contains pixel data needed to supplement the copied motion compensated data in order to complete the current picture.
With B-pictures, the Bidirectional Temporal Prediction technique called Motion Compensated Interpolation is used. Motion Compensated Interpolation is accomplished by transmitting, in lieu of all of the video data for a Target MB, an MV that specifies which 16xc3x9716 block of pixels to copy either from the previous Reference Picture or from the next future Reference Picture, or from the average of one 16xc3x9716 block of pixels from each of the previous and next future Reference Pictures. By xe2x80x9cpreviousxe2x80x9d reference picture is meant a reference I- or P-picture that has already been displayed and is used for motion compensation prediction of subsequent pictures that have yet to be displayed. By xe2x80x9cfuturexe2x80x9d reference picture is meant a picture that is to be displayed in the future, but which will have been contained in the input signal bitstream and received before the current picture to permit the copying of data from it. With the motion vector, an Error Block of only the data, if any, that cannot be supplied by copying from the reference pictures is transmitted in pixel data form.
Motion compensation vectors in current MPEG P- and B-pictures specify relocation of pixel data to the nearest half pel. This requires that the MPEG decoders perform a half-pel interpolation of luminance and chrominance values from adjacent pixel data in a 16xc3x9716 sized block copied from the reference picture in order to arrive at the luminance and chrominance values for the pixels of the macroblock in the current picture. Typical MPEG video decoders carry out this half-pel interpolation upon the performance of the motion compensation as the current picture is being written to the output buffer. With standard resolution systems, the output macroblocks will have the same number of pixels as the reference macroblocks, so that after the half-pel interpolation, the original copied pixel values will be discarded. The resolution of the resulting current picture typically approaches that of the reference picture, which may be a slightly degraded reproduction of the original picture. The addition of half-pel interpolation to motion compensation of video programs enhances the quality of the output when presented in the original resolution.
Video presentation systems produce rectangular images by scanning horizontal lines, from top to bottom, on a screen. The images are formed of rectangular arrays of pixels, for example, at 720 pixels per scan line, with 480 scan lines per picture under the NTSC standard for the current resolution standard used in the United States and Japan and 576 scan lines per picture under the PAL standard for the current resolution standard used in Europe. Standard definition programs are displayed in two formats. Under the NTSC standard, images are displayed at a rate of 30 pictures per second while under the PAL standard, images are displayed at a rate of 25 pictures per second. Under both standards, each image is displayed as two successive fields, a top field that includes the even lines of a picture and a bottom field that includes the odd lines of a picture. Under NTSC, 60 fields per second are displayed. Under PAL, 50 frames per second are displayed.
Frequently it will be necessary to display a program that is broadcast or recorded under one standard, NTSC or PAL, on a system that is configured to display under the other standard. Such cases require frame rate conversion from one rate, 60 or 50 fields per second, to the other rate. Such conversions are from 6 to 5, or from 5 to 6, frames per second. The modes for such conversion are not specified by MPEG.
In addition, many programs to be displayed on systems of either the NTSC or PAL standards are broadcast or recorded from motion picture film, in full frame images at rates of 24 or 20 frames per second. In such programs, these progressive images are recorded with all of the odd and even scan lines interleaved and encoded by frame. Such programs must undergo a frame rate conversion for display in 30 frames per second NTSC or 25 frames per second PAL frame rates. These conversions can be (1) from 24 frames per second to 25 frames (50 fields) per second, (3) from 20 frames per second to 30 frames (60 fields) per second or (4) from 20 frames per second to 25 frames (50 fields) per second. This produces conversion ratios of from 4 to 5 frames per second, from 24 to 25 frames per second, from 2 to 3 frames per second, and again from 4 to 5 frames per second, respectively.
Digital Video Disc (DVD) recordings include information in the bitstream Picture Headers that specify which frames are to be repeated to convert, for example, the 24 frames per second of a motion picture recording to the 30 frames per second of NTSC video. Other programs such as Video Compact Disc (VCD) recordings do not specify which pictures are is to be repeated in a conversion, even though, to play such recordings on a PAL or NTSC system, such conversion must be conducted by the receiver. Furthermore, when recordings are to be converted from PAL to NTSC, or NTSC to PAL, intelligence must be provided in the receiving system to define a repeat scheme that will effectively reproduce the program to the system video output standard.
Furthermore, straight forward conversion systems have had certain minimum buffer memory requirements. In addition, the specified repeat order of DVD programs also requires a minimum amount of buffer memory to implement. The conversion of 24 frame per second progressive frame motion pictures to 30 frame per second NTSC video traditionally employs a conversion scheme referred to as 3-2 pull down, by which three fields are generated from two fields of a frame of the original picture by displaying one of the fields twice. In the case of a progressive frame encoding of a 24 frame per second motion picture film to NTSC 30 frame per second video, such 3-2 pull down may include, for example, displaying three fields from the two fields of one received frame and then two fields from the next frame, followed by three from the next then two from the next. The sequence under MPEG is specified to be: top-bottom-top, then bottom-top, then bottom-top-bottom and then top-bottom from four consecutive frames of the original picture to produce five frames of display, that is, by displaying a ten field sequence out of every four frames of original data, for a 24 to 30 frame per second conversion ratio.
Frame rate conversions, particularly those requiring the repetition of frames or fields, affects the design of the receiver. To repeat a field of a picture, either the decoded field must be stored or the field or the same frame must be decoded more than once. This increases the decoder speed requirements, the required buffer memory, or both, affecting the cost and complexity of the receiver.
An additional problem presented by the variety of conversion requirements is the complexity of the video decoding needed to deal with the alternative conversion situations using methods of the prior art.
Furthermore, not all video presentation systems are to be used in applications requiring the high resolution and other capabilities of DVD or other MPEG compliant systems. Accordingly, in order to make the systems of differing performance capabilities available at optimum cost, video decoders and other system components should be capable of functioning in a variety of systems to provide a wide range of capabilities without imposing on all such systems the same memory and performance requirements. Since it is not economical to produce electronic circuits in small quantities for each application, prior art systems typically are not produced to economically serve each of the applications for which they are needed.
For all of these purposes, the need to make frame rate conversions has the propensity to increase the complexity and cost of the video decoding system or the size and cost of the video buffer memory.
There is a need, particularly for video presentation systems with standard resolution video programs, for efficient and reliable frame rate conversion to take place.
A primary objective of the present invention is to provide a video decoding system and method by which video programs can be efficiently and effectively converted from one frame display rate to another. It is a particular objective of the present invention to provide a video decoding system and method by which such frame display rate conversion can be made by repeating the display of fields from received pictures to display a greater number of fields in a given presentation time interval than the number of frames in the interval that are received in the original program.
A particular objective of the present invention is to provide an efficient and effective system and method for performing frame rate conversions such as, for example, 3-2 pull down conversions including pull down in VCD and DVD and NTSC-PAL or PAL-NTSC conversions. More particular objectives of the invention include providing for such 3-2 pull down while facilitating the use of commands such as pause, fast-forward, slow forward, reverse play and other such commands which are often referred to as xe2x80x9ctrick playxe2x80x9d commands. Such objectives also include implementing frame skipping required in audio-visual synchronization.
Another objective of the present invention is to provide in an MPEG video decoder one module and routine to handle frame rate conversions and other frame rate related issues, as well as frame rate related issues that are dependent on the amount of available buffer memory of the system in which the decoder is used. A further objective of the present invention is to provide an MPEG video decoder that performs a single decompression and transformation method regardless of the occurrence of frame rate conversion and the conversion rate and regardless of differences in the display sequences due to the frame rate conversion, if any, employed, or due to buffer memory size.
A further objective of the present invention is to efficient and effective use of buffer memory and to facilitate the use of minimally sized buffer memory to buffer decoded video picture sequences for display during regular play, where frame rate conversions are required for program viewing, and during trick play modes and transitions into and out of trick play modes, particularly while maintaining optimal display quality. An additional objective of the invention is to provide a memory management system operative to map decoded pictures to buffer memory and allocate buffer memory so as to allow for the sharing of memory locations by more than one field in a way that reduces memory requirements.
According to the preferred embodiment of the present invention, an MPEG video decoder is provided with a decompression and transformation section which decodes a full frame of video on command by a single method that applies regardless of buffer memory and frame rate conversion considerations that would otherwise call for differing display sequences of the decoded pictures. The decoder is provided with a display control module that handles all frame rate and field sequence issues in response to host configuration information, particularly buffer memory size and system type (NTSC or PAL), and to host command signals, such as trick play commands, as well as to information in the received bitstream, particularly the sequence and picture headers and extensions. The display control module handles these issues, in a way that allows the other components of the decoder and of the display output logic to operate in a simple and consistent manner.
In certain preferred embodiments of the invention, pictures are decoded in the order received and as buffer memory for the decoded pictures becomes available. The decoded pictures are assigned attributes that are stored in a table, with one attribute string associated with each decoded picture. Signals are sent to a field display logic section along with the memory address of the next field to be displayed along with the attributes needed for affecting proper display. These attributes designate which field of a picture is to be displayed (top or bottom), whether the memory is to be freed for use by the decoder as the field is being read for transmission to the display, and whether the decoder is to be enabled to decode the next picture as the field is being displayed.
In the certain preferred embodiments, default attributes are predicted based on frame rate conversion considerations and then modified to give consideration to field display sequence information accompanying the pictures of the program. The generation of attribute tables considers buffer memory size, and field display order is modified to the extent necessary to allow the program to be reproduced at a proper display rate even when memory is small, using opposite field data where necessary. The field sequence order facilitates the use of output buffers for B-frame data that is in the range of from 0.53 to 0.67 frames in size.
In various embodiments of the invention, use of available buffer memory is optimized by maintaining tables of offset variables and accessing a fixed table of memory pointers as fields of data are being displayed. The offset data tables are identified to the display logic, which uses the data in the offset tables to indirectly address rows of memory in which the consecutive rows of field data for the field to be output have been stored by the decoder. The decoder loads offset values into the offset tables as pictures are being decoded and rows of blocks of the picture are stored as memory becomes free. Preferably, two full frame reference picture buffers are provided for storing two decoded reference I- or P-pictures and one 0.53 to 1.0 frame buffer is provided to buffer B-pictures, while four offset variable tables are provided, one to hold address offsets for both reference picture buffers and three to hold offsets for up to three different B-picture fields that can be each at least partially present in the output buffer at one time.
The invention provides versatility for various temporal up-sampling and down sampling schemes, particularly frame rate conversion schemes, and its operation is particularly smooth. The invention supports the use of less than three full frames of buffer memory, particularly that referred to as 2.53 mode or 2.53 frame DRAM memory configuration, as well as three frame and four or more frame video buffer memory. The invention also supports various trick play modes and their use simultaneously with 3-2 pull down. The system and method provide versatile conversion and the ability to handle conversions between PAL and NTSC, with and without 3-2 pull down, in both VCD and DVD as well as other formats.
These and other objectives and advantages of the present invention will be more readily apparent from the following detailed description of the preferred embodiments of the invention, in which: