Motion picture photography or film has a rate of 24 frames per second. Every frame itself is a complete picture, also known as a “progressive frame.” This means that all of the fields, top and bottom, correspond to the same instant of time.
Video signals, on the other hand, may have a progressive frame structure or an interlaced structure. An interlaced frame is divided into top and bottom fields, and scanning of one field does not start until the other one is finished. Moreover, video signals have a different frame rate than film. The National Television System Committee (NTSC) standard (used primarily in North America) uses a frame rate of approximately thirty frames per second for interlaced video. The phase alternate line (PAL) standard (used in most of the rest of the world) uses a frame rate of twenty-five frames per second. Progressive video uses a frame rate of 60 frames per second.
The different frame rates used by film and video complicate the conversion between the two formats. In order to solve the problem of having extra video frames when converting film to be shown on progressive television for example, a telecine process converts two frames of film into five frames of video. One method of performing this process involves converting a first frame of film into three frames of video and a second frame of film into two frames video. That is, the first frame of film is repeated twice and the second frame of film of repeated once. Because of the 3-2 pattern, the process is often called 3-2 pulldown. In the 3-2 pulldown, every two film frames are converted into five frames.
The repeated or duplicate frames in the telecine process enable the viewing of film materials in the video format. However, in some applications, it is desirable to remove the duplicate frames. For example, the repeated frames do not contain new information and should be removed before encoding (compression). An inverse telecine process, also referred to as a detelecine process, converts a video signal back to a film format. The inverse telecine process takes incoming video, which is presumed to have been generated from film source material, and outputs the original frame images so that they can be encoded. By removing repeated frames from the video material, the encoding process can be made more efficient, and ultimately the amount of the resulting data can be greatly reduced.
In the inverse telecine process, video is known to be input to an inverse telecine detector, which detects and drops any repeat frames. The inverse telecine detector also inserts flags into the video stream to indicate to an associated decoder that certain frames should be repeated at the appropriate time. The frames associated with repeat frames may be referred to as “film frames”. The frames that are not associated with any repeat frames may be referred to as “video frames”. The film frame has a time duration that is equal to the sum of the time durations of either three video frames or two video frames depending on how many times the film frame should be repeated. The frames are then stored in an encoder pipeline delay buffer before they are sequentially delivered to the input of a video encoding engine, which performs the actual encoding. The time interval from the time a frame is captured in the pipeline delay buffer to the time that the frame is encoded by the encoding engine may be referred to as “pipeline delay”. After the pipeline delay, frames that entered the pipeline delay buffer are delivered to the encoding engine. The encoding engine conveys the current time using a program clock reference (PCR) and the decoding time of each frame using a decode time stamp (DTS) to an associated decoder. The time interval from the time a frame is delivered to the encoding engine to the time that the frame is decoded by the associated decoder may be referred to as a “system delay”. If the time a frame delivered to the encoding engine is delayed, the system delay associated with this frame is reduced. The buffer and encoding engine operate under the control of a processor, which establishes the video pipeline that delivers the frames from the buffer to the encoder engine.
The encoder is configured to select an initial system delay. For example, the encoder may select the initial system delay to be 1 second. The encoder initializes the PCR at the encoding time of the first frame to be the DTS of the first frame minus the initial system delay. The value of the PCR is thereafter incremented and synchronized to the input video clock. The encoder is also configured to synchronize its system time clock (STC) to the input video clock. The encoder initializes the DTS of the first frame to be the STC at the encoding time of the first frame. The encoder thereafter calculates the DTS of a current frame by adding the duration of the previous frame to the DTS of the previous frame in the encoding order. For example, the DTS of a second frame is equal to the DTS of the first frame plus the duration of the first frame. The duration of each frame is measured by the input video clock.
There are two ways in which the pipeline may be filled. For progressive video, one is video frame based, the other is film frame based. For the video frame based pipeline, the repeat frames enter and propagate down the pipe as if the repeat frames have not been dropped. When a repeat frame enters a processing stage, the process is not performed. In this way each of the frames in the pipe continue to incrementally move along the pipeline as if no frames have been dropped, thus the pipeline delay is constant. At any instance during the processing of the video, some processing stages may skip the process for 2-video frame time, some may skip for 1-video frame time, and others may not skip. The management of the video frame based pipeline is relatively more involved, but the system delay is constant. For the film frame based pipeline, the repeat frames do not enter the pipe. Instead, when a repeat frame is detected, the entire pipeline stalls for a video frame time. In this way, the frames in the pipe stop moving along the pipeline for a video frame time. In the case of 3-2 pulldown, for every five video frames that are to be encoded three are dropped (i.e., only two frames are encoded). If the pipeline has a duration of one second, by the time the last frame in the pipeline begins to be encoded, the pipeline will have been delayed by ⅗ of a second, or 600 ms. Since the frames to be delivered to the encoding engine are delayed, the system delay is reduced. As a result of this reduction, the encoder may not have enough time to transmit the encoded bits of the entire frame to the decoder prior to the decoding time. The management of the film frame based pipeline is much simpler. However, the system delay under the film frame based pipeline is varied. The variation in the system delay for each frame is known to increase a difficulty of maintaining a required bit rate to prevent decoder buffer underflow.