In the cinema industry, images are recorded on film. Each image is a single photographic exposure so that all parts of the image correspond to roughly the same point in time. Typically, but not always, images are captured on film at an exposure rate of 24 frames per second.
In the television industry, images can be stored not only on film but also directly as an electronic video signal. It is not uncommon for television programmes to have some of the images recorded on film and some recorded as a video signal.
Images when represented as a video signal comprise a number of orthogonal scanning lines formed by scanning the image from left to right and top to bottom as the scene is viewed. Consequently, the lower right hand parts of any image are scanned at a later point in time than the upper left hand parts. Each of the scanning lines contains luminance (brightness) or chrominance (colour) information for the image.
For each image or frame, the video electronic signal scans twice: a single scan is called a field. Thus, each image or frame comprises two video fields and each video field contains half the lines of an image or frame. The lines of the first field overlap the lines of the second field such that the lines are vertically off-set by half a line height so that each line of the second field lies between the lines of the first field. This technique is known as interlacing.
Again, the images are captured at a particular frame rate. Due to the history of television, different parts of the world have adopted different frame rates. The two most common frame rates are 25 frames per second or 50 video fields per second comprising 625 lines per image which is most prevalent in the UK and parts of Europe and substantially 30 frames per second, that is to say approximately 60 video fields per second comprising 525 lines per image and is most prevalent in North America. When fully encoded with colour information, the former is known as PAL (Phase Alternate Line) and the latter is known as NTSC (National Television Standards Committee). Hereinafter when referring to the PAL standard, one is referring to the frame rate and line number and similarly for the NTSC standard.
In spite of the images being stored on different mediums and at different frame rates, it is highly desirable to convert those images from film to video and vice versa and from one TV standard to another. To accommodate for the various frame rates, various techniques are used and the following list the most common types of conversion.
1. Images recorded on film are stored at 24 frames per second. Such images, if replayed at a slightly higher rate, namely 25 frames per second can readily be used for the PAL standard which uses 25 frames per second or 50 fields per second. Each film frame is repeated to form two video fields. This is shown diagrammatically in FIG. 1. In this case there is a small problem with the accompanying audio due to the frame rate being increased by 4%. Furthermore, there is the fundamental difference between film and video in that the whole of a film frame relates to the same instant in time, whilst the two video fields making up the video frame represent different points in time. This means that the temporal sampling has changed from 24 samples/sec (or possibly 25) to 50 samples/sec. This difference, while not necessarily unacceptable in itself since both the cinema and television industries have been quite happy for many years, becomes noticeable when a programme is created from both film and video source material.
2. The conversion process is more complicated when the images are initially stored on film at a frame rate of 24 frames per second and it is desired to convert those images to the NTSC standard at 30 frames per second or 60 fields per second. This is because of the non-integral relationship of 60 to 24. In general five video fields are to be made from two film frames by taking three consecutive fields from one frame and two consecutive fields from the next frame. This is usually known as 3:2 pulldown conversion or 3:2 ratio conversion. When three fields are taken from one frame there will always be a repeated field, but this repeat may be of either the first or second field. Since five is an odd number the full cycle is in fact ten fields, or four frames, until the full phase is restored. This type of conversion is shown diagrammatically in FIG. 2.
3. The above problems are further compounded when the images or some of them are initially stored on film at a frame rate of 24 frames per second which is then converted to the NTSC standard at 30 frames per second and then converted to the PAL standard at 25 frames per second. Alternatively, some of the images may be stored on film which is then converted to the
standard and then to the NTSC standard.
An example of the above three types of conversion can be found in U.S. Pat. No. 4,998,167 by Jaqua.
The conversion process between TV standards is relatively straight-forward. That is to say converting images stored on NTSC at 30 frames per second to images stored on PAL at 25 frames per second and visa versa. Here one must take into consideration the different frequencies, the different number of lines and the different formats of encoding the electronic signal. This conversion process is known in the art as Standards Conversion. However, Standards Conversion per se is not the subject of the present invention. Some understanding of Standards conversion is however required to appreciate the present invention.
An early Standards Converter was known as the ACE Standards Converter which used a 4 field, 4 line aperture. It entered service in the early 1980s and is still in use today. Subsequent improvements have centred around the size, power, consumption, stability, reliability and decoder performance over the intervening ten years.
Television is a complex sampling process. That is to say, the image is sampled temporally at the PAL or NTSC standard so that each point or location in the image is regularly sampled at the relevant frequency. Each image is then sampled vertically by the scan lines progressing from the top to the bottom of the screen using the line structure. If the signal is to be processed digitally it will be further sampled horizontally on a pixel basis resulting in a three dimensionally sampled signal. Standards conversion is thus the process of transferring the signal from one or more of these sample rates to another.
Creating one sequence of regular samples from another is known as interpolating, and is a quite well understood form of digital filtering. For further reference information on interpolating, one can find relevant details in BBC Research Department Report No 1984/20 or UDC 621.397.65.
In essence, the value of each sample at the new sample points is calculated by summing weighted contributions from the nearest input samples. How many input samples need to be used, and the relative weightings to be applied to them, are decisions made by the designer, and govern the compromise between cost, complexity and performance. The overall family of weighting factors is known as the `aperture` of the filter since it represents the window of input samples which are used to create each output sample.
Television standards conversion is not simply the application of a temporal aperture, to convert the field frequency, and a vertical aperture to convert the line frequency. Field interlace means that each field sample is displaced vertically from its predecessor and successor by half a line, therefore vertical and temporal resampling are interrelated which can be achieved by a two dimensional non-separable interpolator.
It is generally agreed that for high quality processing the aperture should have a minimum width of four field lines and four fields. This means that every output line is made up from weighted contributions from the four nearest lines on the four nearest fields, making 16 in all. The relevant weights, or filter coefficients, depend on the relative position of the output line with respect to the input lines and field timing. This is shown diagrammatically in FIG. 3.
In all cases however whether converting from film to TV or between TV standards, due to the different methods of storing those images and the different frame rates, some distortions of the image are introduced when converting from one type to another. Some of these distortions are concerned with grey scale and colorimetric differences and ways of minimizing some of these distortions are well known and do not form the subject of the present invention. However, there is one particular type of distortion which is particularly apparent when converting moving objects in images. A moving object is that which is in a different location within the image on successive frames or successive fields.
In the aforementioned U.S. patent by Jaqua, a motion detector is used to determine if there have been any editing cuts so as to disturb the field sequence. The motion detector does not however provide information on any moving objects so as to improve the resolution of the conversion.
In contrast, some allowance has been made for object movement when converting from one TV standard to another.
When using a two dimensional interpolator, for example, it has acceptable resolution with stationary images. There is some loss of vertical resolution, but this is inherent to an interpolating filter. Thus as conversion of stationary images is a spatial conversion, relatively high resolution can be obtained. However, the same could not be said for moving images as the conversion is not then just spatial. Any motion in the scene will thus appear as multiple images on the output since four input fields contribute to each output. The quality of this motion portrayal is a compromise between blurring and irregular motion known as judder (a form of aliassing), and is controlled by the selected aperture coefficients.
Aliassing is a type of distortion and juddering is the visual effect of aliassing. Aliassing is caused by a sampled signal containing frequencies above one half of the sampling frequency. This results in erroneous frequencies appearing in the signal which are indistinguishable from the same frequencies had they been in the original: hence the term aliassing. In a temporally sampled signal, such as television, the erroneous frequencies result in the irregular motion of objects, which, unless indistinct due to blurring, will appear to judder.
Hitherto, some standards converters have detected the presence of motion and modified the apertures used to provide a high vertical resolution for stationary images, whilst low resolution, i.e. a different aperture, is used for moving images. This technique is known as motion adaptive interpolation. Full details of motion adaptive interpolation can be found in International Broadcast Engineer March 1989 p. 40-43 inc. Among, the problems associated with adaptive interpolation in the standards converter is that the result is often an obvious change in resolution as soon as any movement occurs in the picture.
Recent developments in digital signal processing have enabled real time analysis of video signals so as to provide allowances for motion when converting from one standard to another. In essence, the conversion process utilizes an analysis of the incoming video to generate motion vectors describing movement within the scene and uses them to allow for the deficiences of the conversion process.
A motion vector describes the motion of all or part of an image. It represents both the direction and scale of the motion.
Thus another way of allowing for movement in standards conversion is called motion compensation which uses these motion vectors. A basic diagram of such a motion compensator as applied to the input video signal can be found in FIG. 4. For further information regarding motion compensation, please refer to Shimano et al. 1989. --Movement Compensated TV Standards Converter using motion vectors, SMPTE Proceedings 1989. The resolution from such motion compensation is, however, highly unsatisfactory.