Motion Pictures (movies) are generally provided in a format of 24 frames per second. Television pictures (e.g., NTSC video), on the other hand, are generally broadcast at approximately 30 frames per second using two interlaced fields (i.e., at a field rate of substantially 60 fps).
In order to convert a motion picture image to a television image, a technique known as TELECINE is used. TELECINE is a process to bring a 24 fps (frames per second) source video (usually a movie is shot at that speed) to approximately 30 fps or approximately 30×2 interlaced fields per second. As the ratio of television frames to motion picture frames is 30:24 (or 5:4), one way to correct for the discrepancy between the two formats is to repeat every 4th frame of the motion picture image to provide an equivalent number of frames for a television image. However, TELECINE uses a slightly more complex technique to achieve this result, in order to reduce jerkiness in the image associated with repeating every 4th frame.
Since a television image is interlaced, it is possible to repeat only a field of one frame with the field of the next frame. The averaging effect of old phosphor-screen CRTs would then reduce or eliminate any artifacts. This solution was good for the era of CRT television displays but may not be suitable in the modern era of non-interlaced monitors and the like.
As an example, suppose each frame contains two interlaced fields, which will be referred to as TOP (T) and BOTTOM (B). The two fields are interlaced, not above or below one another. The TOP and BOTTOM nomenclature is purely as a matter of reference. For a series of five frames, the TELECINE scheme may be represented as follows:
FRAME:12345TOP FIELD:T1T1T2T3T4BOTTOM FIELD:B1B2B3B3B4
Thus, a first frame of the resultant video image may comprise the top and bottom fields of the first frame of source material. The second frame comprises the top field of the first frame of the source material, and the bottom field of the second frame of source material, and so on. In this scheme, no single frame is repeated twice. Rather, composite frames are made from two adjacent fields. For example the second frame is made up of the top field 1 and bottom field 2. This scheme is well known in the art and a solution, which worked well for older and smaller TV sets. In this manner, the image is better averaged over time, resulting in a reduction of jerkiness or the like. When used with a CRT display, the time averaging effects of a phosphor screen works to the advantage of such a scheme. However, with more modern non-interlaced displays as well as flat panel screens, such a technique may not be as optimal.
One problem with the above described TELECINE method is that many DVDs on the market today have TELECINE encoded data (e.g., data converted from 24 FPS cinema to NTSC video or the like) rather than the original 24 FPS cinema source material. These TELECINE encoded discs are generally created by recording a DVD from an NTSC analog signal generated from an original 24 FPS motion picture (cinema) source.
For example, a 30 FPS TELECINE encoded DVD may be created by recording a DVD from an original 24 FPS DVD as follows:
DVDNTSCDVD24 FPS→TV→30 FPSPLAYSIGNALRECORD
Thus, a movie recorded to a recordable DVD off the air might be in such a format, as it was previously converted from 24 FPS motion picture video to NTSC television using TELECINE and then recorded on a DVD as 30 FPS NTSC video. Commercially produced DVDs and the like may also be recorded using such a technique. This situation may lend itself to other areas of video data storage, such as hard drive data storage and the like.
However, a problem is created when using modern television displays, such as progressive scan monitors. Since the interlaced lines may be displayed progressively, a moving object may appear funny. Specifically, since adjacent lines may be from different time periods ( 1/60th of a second apart), a “comb” effect may occur if the object is moving from one frame to another.
ORIGINALREPRODUCEDIMAGEIMAGE------------------------------ ---------------------------------------- ----------
One way to avoid this problem is to use only one field of each frame and expand the field using known filtering techniques to create a full frame. This solution does tend to reduce vertical resolution, which is unacceptable in the high-resolution television market. This technique may also introduce other artifacts. For stationary objects (e.g., landscape scene) where there is no movement, filtering techniques may create “jitter” between adjacent lines, particularly where straight horizontal edges occur. A number of solutions for reducing jitter in computer and television displays exist, but even using such techniques, jitter can still be a problem, particularly for straight edges and thin lines that may not average well with adjacent pixels.
One Prior Art solution to this jitter problem is to try to recreate the original 24 FPS image from the 30 FPS TELECINE image using temporal filtering techniques. Referring back to the diagram above, we see that the fields are presented in a known pattern in a TELECINE conversion:
FRAME:123456. . .etc.TOP FIELD:T1T1T2T3T4T5. . .etc.BOTTOM FIELD:B1B2B3B3B4B5. . .etc.
The difference between adjacent fields may be used to indicate whether an image is TELECINE encoded or not. For example, in the first two frames 1 and 2, both upper fields are T1. In actuality, they are not completely identical, as the image has been converted from digital to analog and back to digital again. Thus, the two T1 fields may be slightly different. For the sake of illustration, they will be referred to as T1A and T1B.
However, fields T1 and T2 will be much more different if there is any motion in the scene whatsoever. If the difference (e.g., luminance difference) is sampled between adjacent fields and if a certain pattern or “cadence” is detected, then a determination can be made as to whether the signal is TELECINE encoded. Once this information is known, the proper fields can be extracted and the original 24 FPS cinema image can be recreated and displayed on a progressive scan monitor (repeating one frame out of four) without the motion artifacts or the jitter artifacts.
FIG. 1 is a graph illustrating luminance differences in the samples compared with a threshold value for detecting TELECINE encoding. In the graph of FIG. 1, the Y-axis represents a relative luminance difference value between two fields. The S values on the X-axis represent different frame points. The difference values for the example illustrated above may be calculated as follows:S1=T1B−T1A S2=T2−T1B S3=T3−T2S4=T4−T3S5=T5A−T4S6=T5B−T5A 
As illustrated in FIG. 1, these differences create an “error profile” which is easily identifiable. A threshold value may be calculated to determine whether adjacent fields are considered identical or not. If this pattern or cadence is detected, then the fields can then be stored in a field store and then reconstituted into 24 FPS cinema as follows:
FRAME:12345TOP FIELD:T1T2T3T4T5BOTTOM FIELD:B1B2B3B4B5
The 24 FPS data stream can be reconstructed by storing the individual fields and then reconstructing them back to their original (pre-TELECINE) order. This 24 FPS data can then be displayed on a progressive scan monitor without the introduction of artifacts or the like.
Techniques of detecting TELECINE encoding in an NTSC or other video signal are known in the art. While others have reconstituted 24 FPS cinema from 30 FPS NTSC TELECINE video, there have been some problems. For example, a long scene where the camera does not move and there is no movement (e.g., landscape or black screen or still image) will not show the cadence illustrated above. Moreover, when a scene change or commercial break occurs, the cadence may be interrupted, making conversion from source video back to 24 FPS more difficult and/or throwing off the conversion process. A better filtering technique is thus needed to detect this cadence for still images and also for scene changes.