Encoding methods such as the well known MPEG-1 and MPEG-2 standards had been popularly used for efficient transmission and storage of video. A MPEG encoder compresses an input video signal picture-by-picture to produce an output signal or bitstream compliant to the relevant MPEG standard. Pre-processing techniques can be applied to the input video signal before encoding, for example, to remove noise and re-format the signal (eg. 4:2:2 to 4:2:0 conversion, image size conversion, etc.).
The input video signal is typically in an interlaced format, for example the 525/60 or 625/50 (lines/frequency) format, with each video frame consisting of two fields (top field and bottom field). However, the source material of the video signal may be originally produced on film and converted to the video signal via a telecine process. This process converts a progressive source into an interlaced format and provides at the same time, if necessary, frame rate conversion for example using a 3:2 or 2:2 pulldown technique. In the case of 24 Hz film to 525/60 Hz video conversion, each progressive film picture is converted to two interlaced video fields and, in addition, there are 12 repeated fields according to the 3:2 pulldown patterns in every second of the converted video. Improvement in coding efficiency can be obtained if the video source from film is identified and the repeated (or redundant) fields are detected and removed before coding. Pre-processing techniques applied before encoding can also gain from the results of film picture detection.
Methods for detecting progressive or interlaced frame and/or repeated fields have been known in the prior art. For example, U.S. Pat. Nos. 5,317,398, 5,398,071, 5,491,516, and 5,757,435 disclose methods generally involving a calculation of differences between two adjacent fields of the same parity (top or bottom field) and a comparison of the results of the calculation with some thresholds for repeated field detection and/or with patterns indicative of the 3:2 pulldown process. Methods involving comparison of three adjacent fields of different parity are disclosed in U.S. Pat. Nos. 5,365,273, 5,565,998, and 5,689,301. These methods compare three time adjacent fields, and make decisions based on detection of 3:2 (and 2:2) pulldown patterns. In U.S. Pat. No. 5,452,011, a method utilizing four successive fields is disclosed. This method provides progressive/interlace detection as well as redundant field detection without assumption of fixed 3:2 or 2:2 pulldown pattern. It utilizes both intra-field and inter-field (same and opposite parity) differences.
Although the telecine process of film to video conversion is well understood, and its results are very predictable, the eventual repeated field patterns and number of repeated fields in the converted video may be changed in post-production processes such as scene cuts and overlaying. For example, a sub-title overlay may begin on a repeated field in the converted video, and this will not only break the 3:2 pattern but also create two interlaced frames in a progressive/interlace detection process.
Generally low in complexity, methods utilizing differences of two adjacent fields of the same parity to detect repeated fields lack the ability to detect individual progressive or interlaced frames and lack the ability to detect 2:2 pulldown sequences. Furthermore, these methods do not perform well for 3:2 pulldown sequences with frequent broken 3:2 patterns, and therefore efficiencies of subsequent pre-processing and encoding are affected. Methods involving comparison of three of more adjacent fields of different parity provide the ability to detect both 3:2 and 2:2 pulldown sequences. However, given the fact that fields of different parity are compared, the methods are sensitive to vertical details and vertical motion within the sequences.
Existing methods also suffer from detection latency problems. Basically, the decision of whether or not two fields belong to a same progressive film frame is not made immediately after receiving the second field of the two fields from a video source. Instead, the decision is made by existing methods only after receiving one or more subsequent fields. This has an impact on the total number of field buffers needed in systems such as those including MPEG encoding and necessary pre-processing, as well as adaptivity of the methods to sudden changes in the film/video characteristics.