Films are conventionally shot at 24 frames per second. In order to smoothly play a film on a television set, the film has to be converted to either a National television System Committee (NTSC) video format (i.e., interlaced 60 fields per second), or a Phase Alternate Line (PAL) video format (i.e., interlaced 50 fields per second) in a process called telecine. In the telecine process, each frame of the film is decomposed into two fields of video, a top field and a bottom field. In the case of converting into NTSC, some of the video fields are repeated. To efficiently encode such a video sequence, it is desirable to detect the repeated fields before actual encoding starts. The process of detecting the repeated fields in a video sequence generated by the telecine process is called inverse telecine. The detection problem is more complicated than expected for several reasons, such as noise introduced in the video processing chain, scene changes and post-editing.
A conventional method to detect repeated fields in a telecined field sequence is to compute a difference between a current field and a previous same-parity field then compare the difference with a predetermined threshold. If the difference is less than the threshold, the current field is declared as a repeated field. The difference can be measured as sum of absolute differences (SAD) or sum of squared differences (SSD) between the two fields. In some real-time systems, the difference is only available at a field level or a strip level (i.e., a strip is a number of horizontal lines of a field), but not at a macroblock level or a pixel level.
The conventional method is very simple to implement in both hardware and software, but is not reliable due to the following reasons. First, the conventional method utilizes knowledge of a noise level in the telecined sequence to determine the threshold. The noise level knowledge is rarely available because different telecine machines generate different noise levels. Furthermore, the noise levels vary with the initial encoding process in transcoding applications. Second, some particular fields may be incorrectly detected as repeated fields (i.e., false positive detections) in scenes with slow motion and/or low-motion because the field differences are very small.
To improve the reliability of inverse telecine, field-to-field motion vectors are conventionally used to detect the repeated fields. In another conventional method, motion estimation is performed between two consecutive same-parity fields. If a field repeats a previous same-parity field, the resulting motion vectors are mainly of zero length. The zero-length motion vectors are used to detect repeated fields. In another conventional method, motion estimation is performed between two fields of the same parity, but the two fields are not limited to be neighbors. Depending on the picture coding structure, the motion vectors of a repeated field are either very small compared with those of the previous same-parity field, or almost identical to those of the previous same-parity field. The very small or almost identical motion vectors are used to detect repeated fields.
The motion vector type methods are not suitable for some video coding systems where field-to-field motion estimation is not available. For example, an efficient way to encode a video sequence converted from a film material is to detect the repeated fields in the video sequence, combine the two fields from the same film frame into a frame and then perform frame coding. In such situations, field-to-field motion estimation cannot be directly shared by the inverse telecine module and the actual encoding module, resulting in extra cost.
Another type of method to improve the reliability of inverse telecine is to explicitly utilize known telecine patterns. Ten consecutive fields can be examined to match the telecine patterns. If a match is found, the field at a certain position is declared as a repeated field. If a video sequence generated from the telecine process contains mainly regular patterns, some conventional methods work fairly well. However, the resulting patterns may be quite irregular due to post-editing, scene changes, speed varying and insertion of video effects such as fades.