The Motion Picture Experts Group (MPEG) standard is representative of a standard for compressing digital video signals for transmission or storage. The standard was discussed by ISO-IEC/JTC1/SC2/WG11 and has been proposed as a draft standard. The standard stipulates a hybrid compression method, combining motion compensated prediction coding with discrete cosine transform (DCT) coding.
The first compression technique, motion compensated prediction coding, takes advantage of the correlation of video signals in the time domain. According to this method, the video signal representing the current picture (a frame or a field) is predicted from the decoded and reproduced (reconstituted) video signal representing a reference picture, which is a picture that is earlier or later than the current picture. Only the motion prediction errors between the video signal representing the current picture and the reconstituted video signal representing the reference picture are transmitted or stored. This significantly reduces the amount of digital video signal required to represent the current picture.
The second compression technique, DCT coding, takes advantage of the intra-picture, two-dimensional correlation of a video signal. According to this technique, when a block of the current picture, or a block of motion prediction errors, is orthogonally transformed, signal power is concentrated in specific frequency components. Consequently, quantizing bits need only be allocated to the DCT coefficients in the region in which the signal power is concentrated. This further reduces the quantity of digital video signal required to represent the picture. For example, in a region in which the image has little detail, and in which the video signal is thus highly correlated, the DCT coefficients are concentrated at low frequencies. In that case, only the DCT coefficients in the low-frequency region of the distribution pattern are quantized to reduce the quantity of the digital video signal.
Because the coding techniques of the MPEG standard are basically intended for use with interlaced video signals, problems arise when they are applied without modification to non-interlaced video signals. In particular, the compression ratio can be impaired when the MPEG techniques are applied to non-interlaced video signals.
A motion picture consists of a sequence of still pictures reproduced in succession, normally 24 pictures per second. A motion picture film source, e.g., a motion picture film or a 24-frame video signal, represents each picture of the motion picture as a full frame with a frame rate of 24 Hz, whereas an interlaced video signal represents each picture of the motion picture as two consecutive fields, each field representing half of the picture and being displaced from one the other by one line. An NTSC interlaced video signal has a field rate of 60 Hz. Consequently, deriving an interlaced video signal with a field rate of 60 Hz from a motion picture film source with a frame rate of 24 Hz, such as is done using a telecine machine, requires a conversion between the number of frames per second of the film source and the number of fields per second in the video signal.
A motion picture film source with a 24 Hz frame rate is commonly converted to an interlaced video signal with a 60 Hz field rate, such as an NTSC video signal, by a technique known as 2-3 pull-down. FIG. 1 illustrates how 2-3 pull-down works.
The 2-3 pull-down process involves a repetitive sequence of deriving two fields of the video signal from the first of every two consecutive frames of the motion picture film source, and deriving three fields of the video signal from the second of the two consecutive frames of the film source. In FIG. 1, frames 800 and 801 are consecutive frames of a motion picture film source with a frame rate of 24 Hz. In the figure, each film source frame is divided into an odd field, indicated by a solid line, and an even field, indicated by a broken line.
First, two fields of the video signal are derived from the first film source frame 800. The video field 802, an odd field, is first derived from the first film source frame 800, followed by the second video field 803, an even field. Then, three fields of the video signal are derived from the second film source frame 801. The video field 804, an odd field, is first derived, followed by the video field 805, an even field, followed by the video field 806, another odd field. The two odd fields 804 and 806 are identical to one another. This process is repeated for the other two film source frames 808 and 809 from which the video fields 810 through 814 are derived. Note that an even field 810 is derived first from the film source frame 808, and that two even fields 812 and 814 are derived from the film source frame 809. With the arrangement shown, a sequence of ten fields of the video signal is derived from a sequence of four frames of the motion picture film source, after which the sequence is repeated.
FIG. 2 shows the result of combining into frames consecutive pairs of fields of the interlaced video signal derived by the process shown in FIG. 1. The video fields 900 and 901 are derived from the same film source frame. Video fields 902 and 903 are also derived from the same film source frame. Hence, the video frame 907, produced by combining the video fields 900 and 901, and the video frame 908, produced by combining the video fields 902 and 903, are each derived from the same film source frame. On the other hand, the video frame 909, produced by combining the consecutive video fields 904 and 905 is derived from two different film source flames.
When MPEG coding is applied to the flames of a non-interlaced video signal derived from an interlaced video signal, which, in turn, is derived a motion picture film source using 2-3 pulldown, coding the flames 907 and 908 in the above example presents no problems because these flames are each derived from a single film source frame, and are thus internally correlated. However, difficulties can be encountered when coding the video frame 909 because it is derived from two different flames of the film source, and, hence, it is not necessarily internally correlated.
If the motion picture is fast-moving, or if a scene change occurs within the frame, a video frame derived from two different flames of the film source has low vertical correlation, which reduces the efficiency of DCT-based signal compression. Moreover, motion compensated prediction can also go wrong because of the reduced correlation of the video signal.