When film is converted to video tape, the conversion is done by creating a sequence of video images from the sequence of film images. The video images, or frames, are electronic representations of the film. The illusion of motion is created in film and video by rapidly showing multiple images on a screen.
In film, the images are single exposed photographic images. However, on video tape, the images are represented as multiple lines. Typically, video television signals comprise multiple horizontal lines, where each line contains brightness information, or luminance, and color information, or chrominance, for a specific vertical location on a television screen. Since most spatial interpolators manipulate the line information as digital data, the operation of the spatial interpolator does not depend on the method used to encode the luminance and chrominance information onto a line. The encoding method used is usually only relevant to the converter which does the initial conversion of the electronic signal into digital luminance and chrominance data.
In both film and video, the images are designed to be shown at a fixed rate, so that a viewer sees the film or video at roughly the speed that the images were captured. Moving picture film is taken and shown at a characteristic frequency, expressed in frames/second ("fr/s"). Most U.S. moving picture film is shown at 24 fr/s. Video tape is also shown at a characteristic frequency as well, but also has images with a characteristic number of vertical lines per image.
Due to the history of television, different parts of the world have adopted different standards for broadcast television and video tape. In North America, the standard is 30 fr/s (or images/second) and 525 lines/image; this standard is sometimes called NTSC (after the National Television Standards Committee). In the United Kingdom and parts of Europe, the standard is 25 fr/s (or images/second) and 625 lines/image; this standard is sometimes called PAL (Phase Alternate Line). Film in North America is usually taken at 24 fr/s and is sometimes shown in other parts of the world as a moving picture at 25 fr/s, which only slightly increases the speed of the moving picture relative to the speed at which the images were recorded. Although the terms "NTSC" and "PAL" refer to color standards as well as field frequencies and line densities, the terms are used herein to generally describe standards with differing field frequencies and lines per image without regard to any differences in color encoding schemes.
Film is inconvenient to edit, since editing requires that film be physically cut. The editing process is tedious, expensive and in the end results in the loss of frames. Video editing can be done electronically copying the original, allowing many attempts at editing to be done without loss of the original material. Video editing can be more expedient, without the delays required of film developing, splicing and printing.
Standards converters exist to take video, edited or otherwise, from one standard and create video in another standard. The conversion from one standard to another consists of creating an output sequence of images (or frames) from the input sequence of images, adjusting the number of images per unit time and the number of lines per image from the input standard to the output standard.
Both NTSC and PAL standards require that each video image or frame be divided into two video fields. Each video field contains half the lines of an image or frame. In NTSC, a field consists of 2621/2 lines; in PAL, a field consists of 3121/2 lines. To display a full frame on a television screen in either standard, the television receives the first field and fills the entire screen with the 2621/2 or 3121/2 lines of the first field, then receives the second field and intersperses the lines of the second field between each of the lines of the first field.
Since persistence of a television screen is created by a combination of television tube phosphor and the behavior of the human eye, the lines of the first field still persists on the screen when the lines of the second field are shown, and the lines of the second field persists on the screen when the lines of the first field of the next frame are shown onto the screen. Since two fields are always shown onto a screen sequentially, fields are stored on tape sequentially. In editing the video, it is possible that two fields that make up a frame, each may be from a different image.
Another problem with video fields occurs in NTSC format, which uses 5 video fields for every 2 film frames. Film is generally created by photographing a scene 24 times per second, making 24 frames/second, whereas the NTSC video standard has 60 fields/sec. To get 60 fields from 24 film frames per second, additional fields must be created. The standard method of creating extra fields is known in the art as "adding a 3-2 cadence." Editing video generally destroys the 3-2 pattern.
The PAL standard requires 50 fields/sec. Since film frames are shot at 24 frames/sec, resulting in only 48 fields/sec, some manipulation must be done to add two fields/sec. Several solutions have been used: playing the program back at a slightly higher speed; filming at 25 frames/sec; or duplicating a field every 1/2 second. Due to the various world standards for both film and video, processes known as standards conversion have been created. The typical standards converter employs a blend of frames to minimize picture disturbances caused by the various picture creation technique.
In converting from one standard to another, one must take into consideration the different frequencies, the differing number of lines, and the different formats of encoding an electronic signal. The conversion of formats is quite straightforward, given that the formats are designed such that television circuitry can easily convert a signal into data made up of lines which represent the varying color and intensity of the video signal on a horizontal line of the screen. Accounting for the number of lines and the number of fields/second is more difficult.
In moving from one standard to another, frames need to be created at points in time where no frame existed in the original standard, and lines are created where no lines existed in the original standard. Lines in the new standard are created by an interpolator which interpolates lines of the old standard that are close spatially to the new line, with lines that are closer given greater weight. Likewise, if fields must be created in the new standard where none existed in the old standard, the new fields are created by temporally interpolating the fields that are closest in time to the new field, and the closest fields are given greater weight. Thus, if the number of fields/sec and the number of lines/field changes from one standard to another, each line in the new standard is a weighted average of lines in the old standard which are nearby in space and time. Because of this averaging process, the edges of moving objects in a scene may not be as sharp as in the original standard.
In doing field by field interpolation, more than one field has been used to increase spatial resolution, but only one field should be used when an object on the screen is in motion, to prevent the blurring of edges.
The process of detecting motion, and adjusting the interpolation process based on the amount of motion is known as motion adaptive interpolation. Simple algorithms for motion adaptive interpolation are known in the art. For example, U.S. Pat. No. 4,766,484 discloses an apparatus which takes as its input video fields in one standard and outputs video fields in another standard, using motion adaptive interpolation and field interpolation. However if the original image was created on film and subsequently transferred to video, the apparatus described combines the input fields without regard to the original film frame from which the input fields were derived.
A description of the details of various television standards in use around the world, and of shortcomings in current methods of video tape editing can be found in Handbook of Recommended Standards & Procedures, International Teleproduction Society (1987). A discussion of spatial interpolation and temporal interpolation can be found in I.B.A. Technical Review Number 8, Digital Video Processing-DICE (1976). The methods discussed in I.B.A. Technical Review show how to convert from one standard to another, but the conversion methods discussed therein have proven to yield less than desirable results in terms of image sharpness.