The present invention relates to methods and apparatus for encoding, decoding and displaying images and, more particularly, to methods and apparatus for encoding, decoding and displaying images in a manner that provides relatively smooth motion.
Video sequences comprise a series of images, e.g., frames. In the case of motion pictures, where film is used, each frame corresponds to a frame of film.
During filming, frames which are also sometimes called pictures, are captured at a pre-selected rate, e.g., 24 frames a second. The rate at which images are converted to frames, e.g., by the taking of a sequence of pictures, is refereed to as the image capture rate. When using rolls of film to show an image sequence, e.g., in a movie theater, pictures are normally displayed at the same rate at which they were initially captured. However, the shutter of the projector may be shuttered multiple times per picture display time to provide a flicker rate which is higher than the video capture rate. This is because higher flicker rates tend to be less annoying than lower flicker rates.
Significantly, in the case of film, pictures are normally displayed at a uniform rate which is usually a function of the picture capture rate. Thus, motion in film tends to be relatively smooth since each new picture corresponds to the same amount of time as the preceding picture.
The advent of television and, more recently, computers, has greatly expanded the number of devices which use media other than film to store, transmit and display video images. Most television sets are capable of displaying images at a single rate determined by the television set""s horizontal and vertical refresh rates. Computer display devices, e.g., multi-sync monitors, are frequently capable of responding to synchronization signals which may fall within a range of refresh rates supported by the monitor. As a result, multi-sync monitors are capable of supporting one or more display refresh rates while television sets usually support a single display refresh rate.
While video images are now commonly transmitted as analog signals, e.g., in the case of NTSC television, the use of digital data to represent video images is growing in popularity. For example, digital video disk players, digital satellite broadcasts to the home, and digital high definition television, currently rely on the transmission of video images as digital data.
A frame is generally used to describe a complete image, or an image composed of two interleaved fields, which are to be displayed on a display device. Frames may be coded as either progressive or interlaced images. In the case of progressive images, all of the lines of a frame are coded to be displayed in sequence. Thus, progressive frames may be coded so that the lines of the frame will be displayed in sequence starting at the top of a display screen and ending at the bottom of the display screen each time the display is refreshed. Interlaced frames normally comprise two fields, a first field corresponding to the even lines of a frame and a second field corresponding to the odd lines of a frame. During display, the lines of the first field of an interlaced display are normally refreshed and then the lines of the second field are refreshed. In this manner, with each updating of an interlaced display, every other line of a frame is updated. In the case of non-film video sequences, the term picture is often used to refer to either a progressive frame or a field of an interlaced frame.
Various standards relating to the encoding and transmission of digital video signals now exist. One such standard is MPEG-2 which is described in detail in xe2x80x9cINFORMATION TECHNOLOGYxe2x80x94GENERIC CODING OF MOVING PICTURES AND ASSOCIATED AUDIOxe2x80x9d, Recommendation H.262 ISO/IEC 13818-2, published by INTERNATIONAL ORGANISATION FOR STANDARDIZATION ISO/IEC JTC1/SC29 (Nov. 1993).
Frames of a video sequence are normally encoded and then transmitted to a destination device once. While encoded frames are usually transmitted only once, frames or fields of a frame may be displayed repeatedly. Since frames tend to be encoded and decoded on average at a rate corresponding to the rate at which the original images were collected, the rate at which digitally represented video frames are encoded or decoded over a period of time normally corresponds to, or matches, the image capture rate. The rate at which frames are coded is sometimes referred to as the coded frame rate.
The MPEG-2 standard provides for a frame rate indicator in the encoded bitstream. The frame rate indicator provides the xe2x80x9cindicated frame ratexe2x80x9d, i.e., the rate at which frames are to be displayed. Significantly, MPEG-2, as well as other digital video standards, allows for the actual number of coded frames per second, which is coded and transmitted as part of a digital bitstream, to differ from the indicated frame rate specified by the frame rate indicator which is included in the bitstream. Normally, discrepancies between the coded frame rate and the indicated frame rate are resolved through the use of a field or frame repeat mechanism, e.g., one or more field or frame repeat commands included in a transmitted bitstream. MPEG-2 supports a repeat_first_field decoder instruction.
Frequently, an encoded bitstream""s indicated frame display rate differs from the actual coded frame rate. This is because video images which are captured using a first media, e.g., film, are often subsequently coded for display using a different media, e.g., analog or digital television. The frequent discrepancy between indicated and coded frame rates is not surprising given the large number of possible frame rates. Consider for example, the ATSC Digital Television Standard, described in xe2x80x9cATSC DIGITAL TELEVISION STANDARDxe2x80x9d, Doc. A/53, published by the Advanced Television Systems Committee (Sep. 16, 1995) which is based in part on the MPEG-2 standard. The ATSC standard permits a plurality of indicated frame rates to be supported including 23.976 Hz, 24.000 Hz, 29.970 Hz, 30,000 Hz, 59.940 Hz, and 60.000 Hz. The actual coded frame rates may differ from any one of these indicated frame rates requiring the use of a repeat field or repeat frame operation to achieve the indicated frame rate at the time the video images are output for display.
One technique, often called the 3:2 pull down technique, for adapting film images recorded at 24 HZ to be displayed at 30 HZ is illustrated in FIG. 2. In the first row identified by reference numeral 20, film pictures 1-12 (P0-P11) corresponding to a one half second period of time, are shown. In the second row identified by reference numeral 22, numbers 0 through 29 are used to indicate the 30 fields which will be displayed in a half second period of time. A box is used in row 22 to group together fields which correspond to the same film picture. Note how, in row 22, three fields are displayed for every odd numbered film picture while two fields are displayed for every even numbered film picture. In row 4 of FIG. 2, indicated by reference numeral 24, a capital F is used to indicate a frame and the sub-script is used to indicate the number of the frame. Note how a total of 12 frames, F0-F11, are used to represent the 12 film pictures P0-P11. The fourth row of FIG. 2, indicated by reference numeral 26 shows the sequence of displayed fields. A review of row 26 shows how the third field displayed for each even numbered film picture, and thus even numbered frame, is a repeat of the first field while the first and second fields of even numbered film pictures are displayed only once. The field repetition rate illustrated in FIG. 2 may be achieved, e.g., by the inclusion of repeat_first_field commands in an encoded MPEG-2 bitstream.
In the FIG. 2 example, the coded frame rate is 24 frames per second while the indicated frame rated included in an encoded bitstream would be 30 frames/sec since 60 fields/sec are displayed in one time period as the result of the illustrated field repetition.
Note in FIG. 2 how even frames are displayed for three field times while odd frames are displayed for only two field times. Thus, even frames are displayed for longer periods of time than odd numbered frames. This has the unfortunate effect of giving the impression of jerky, as opposed to smooth, motion. This is due to the apparent speeding up and then slowing down of the displayed frames. The jerky motion resulting from the use of the repeat field or repeat frame command is sometimes called judder.
Judder is frequently apparent in images which are captured and/or coded at one rate and then displayed at another rate, e.g., at an indicated frame rate which differs from the captured or coded frame rate.
In known systems such as television sets, which use display devices that support only a single fixed refresh rate, video decoders normally pad the decoded video output signal by repeating fields or frames so that the video decoder output rate matches the video display device refresh rate. Thus, the actual displayed frame rate may differ from both the indicated frame display rate information included in the encoded bitstream and the image capture rate. Accordingly, in known systems with a single fixed display refresh rate, judder may result even when the indicated frame rate matches the image capture rate.
FIG. 1 illustrates a known system 10, e.g., a digital television set, for decoding and displaying encoded MPEG video sequences. As illustrated, the system 10 comprises an MPEG video decoder 14, a video sync signal generator 19, a video signal generator 16 and a fixed rate display device 18.
The MPEG video decoder 14, receives the bitstream 12. The exemplary bitstream 12 includes, e.g., encoded frames, one or more repeat_first_field commands and an indicated frame rate. The MPEG video decoder 14 decodes the encoded bitstream 12 to generate therefrom, in the interlaced case, decoded frames comprising first and second fields. The video sync signal generator 19 generates vertical and/or horizontal sync signals which are used to control the refreshing of the display at the display""s fixed refresh rate. The generating of the sync signals is synchronized with video decoder operation, e.g., through the use of timing information obtained from the encoded bitstream. The sync signals generated by the video sync signal generator 19, and the decoded image signals produced by the video decoder 14, are received and processed by the video signal generator 16.
As is known in the art, the video signal generator 16 converts digital signals input thereto into an analog signal format which is suitable for use by the display device 18. In cases where the video sync signals are incorporated into the video color image signals, it is the responsibility of the video signal generator 16 to combine the video color signals and sync signals prior to supplying them to the display device 18. In cases where the video and image signals are supplied to the display device 18 on separate lines, it is the responsibility of the video signal generator 16 to insure that both the synchronization signals and image signals are transmitted to the display device via an interface included in the video signal generator.
As discussed above, due to differences in the captured and/or coded frame rate and the displayed frame rate, judder may result in the known system. This can be annoying and, in many cases, noticeably degrades the quality of a displayed image sequence.
In view of the above discussion, it is apparent that there is a need for methods and apparatus for reducing and/or eliminating judder in displayed video sequences. For hardware compatibility reasons, it is desirable that any new methods not prevent the display of an encoded bitstream on an existing system which does not implement the methods and apparatus of the present invention.
The present invention relates to methods and apparatus for reducing and/or eliminating judder in displayed images. In accordance with various features of the present invention, a multi-sync display device is used and the refresh rate of the display device is controlled to minimize or avoid judder. Control of the display device refresh rate is performed in various embodiments, as a function of frame display, frame coding, field coding and/or image capture rate information included in an encoded bitstream. Alternatively, the refresh rate of a display is controlled as a function of decoding rate information or other information available from a decoder.
In accordance with one exemplary embodiment of the present invention, frames are displayed and the refresh rate of a display device is controlled to be an integral multiple of the indicated frame display rate included in an encoded bitstream.
In accordance with another feature of the present invention, information is included in the encoded bitstream about the rate at which frames or fields are encoded and/or images are captured for encoding. This information may be expressed in terms of, a frame coding rate, a field coding rate, and/or an image capture rate which is included in an encoded bitstream.
By adjusting the refresh rate of a display to be an integer multiple of one of the above discussed rates, the display time for a series of frames can be adjusted so that sequential frames are displayed for the same amount of time thereby reducing or eliminating the above discussed problem of judder.
In one particular embodiment, information about the actual rate at which frames are being coded is added to the generated encoded bitstream. This data, referred to herein as xe2x80x9ccoded frame rate informationxe2x80x9d may be in addition to the conventional frame display rate information required, e.g., by various standards such as MPEG-2. The coded frame rate information may be included as user data in an MPEG-2 bitstream or as metadata in systems which support the use of such data. The term metadata is used here to describe data which is not necessary, but may be useful to, the decoding and/or display of encoded image data. Because metadata is not necessary to decode an image, it is sometimes called enhancement data.
In some embodiments, where coded frame rate information is not included in a bitstream, an estimate of the coded frame rate is determined via information, e.g., decoding frame rate information, obtained from the video decoder. In cases where indicated frame rate information is included in the encoded video bitstream, the indicated frame rate information may be used in combination with information obtained from the video decoder regarding field and/or frame repetition rates to determine the coded frame rate.
By controlling the refresh rate of a display device as a function of frame rate or image capture rate information and/or decoding frame rate information, the methods and apparatus of the present invention reduce and/or eliminate judder which would otherwise appear in many displayed video sequences.
Numerous additional features, embodiments and advantages of the methods and apparatus of the present invention are discussed below in the detailed description which follows.