This invention relates to a method and system for combining the information from multiple video fields into a single, high quality still image.
Individual fields from video sources generally exhibit the following shortcomings:
sensor, tape and transmission noise;
luminance aliasing due to insufficiently dense spatial sampling of the optical scene;
chrominance aliasing due to insufficiently dense spatial sampling of particular color components in the optical scene (often occurs with single CCD video cameras which can only sense one color component at each pixel position);
relatively poor resolution.
However, video sources have the advantage that many pictures of the same scene are available, usually with relatively small displacements of the scene elements between consecutive fields. After suitable compensation for motion, these multiple pictures can be combined to produce a still image with less noise. Perhaps more importantly, however, the existence of motion allows for effectively having a denser sampling of the optical scene than is available from any single field. This opens up the possibility for aliasing removal as well as resolution enhancement.
While analog video is considered, many of the following observations also apply to a variety of digital video sources. One observation is that the resolution of the chrominance components is significantly lower than that of the luminance components. Specifically, the horizontal chrominance resolution of an NTSC (National Television System Standard) broadcast video source is about {fraction (1/7)} that of the luminance. Also, although the NTSC standard does not limit the vertical resolution of the chrominance components below that of the luminance components, most popular video cameras inherently halve the vertical chrominance resolution, due to their single CCD design. Since the chrominance components carry very little spatial information in comparison to the luminance component, a process might focus resolution enhancement efforts on the luminance channel alone. Moreover, the computational demand of the multi-field enhancement system can be reduced by working with a coarser set of chrominance samples than that used for the luminance component.
A second observation concerning analog video is that the luminance component is often heavily aliased in the vertical direction, but much less so in the horizontal direction. This is to be expected, since the optical bandwidth is roughly the same in both the horizontal and vertical directions, but the vertical sampling density is less than half the horizontal sampling density. Moreover, newer video cameras employ CCD sensors with an increasing number of sensors per row, whereas the number of sensor rows is set by the NTSC standard. Empirical experiments confirm the expectation that high horizontal frequencies experience negligible aliasing, whereas high vertical frequencies are subjected to considerable aliasing. Hence, it is unlikely to be possible to increase the horizontal resolution of the final still image through multi-field processing; however, it should be possible to xe2x80x9cunwrapxe2x80x9d aliasing components to enhance the vertical resolution and remove the annoying aliasing artifacts (xe2x80x9cjaggiesxe2x80x9d) around non-vertical edges.
Hence, what is needed is a method and system for combining the information from multiple video fields into a single, high quality still image.
A system combines information from multiple video fields into a single high quality still image. One of the fields is selected to be the reference and the remaining fields are identified as auxiliary fields. The system reduces the noise, as well as the luminance and color aliasing artifacts associated with the reference field, while enhancing its resolution, by utilizing information from the auxiliary fields.
An orientation map is constructed for the reference field and is used to directionally interpolate this field up to four times the vertical field resolution.
Motion maps are constructed to model the local displacement between features in the reference field and corresponding features in each of the auxiliary fields. Motion is computed to quarter pixel accuracy in the vertical direction and half pixel accuracy in the horizontal direction, using the directionally interpolated reference field to accomplish the sub-pixel search. The motion maps are used firstly to infer an orientation map for each of the auxiliary fields directly from the reference field""s orientation map (note that orientation maps could be computed for each field separately, if the computational demand were not considered excessive) and later to guide incorporation of information from the auxiliary fields into the reference field.
The auxiliary fields are then directionally interpolated to the same resolution as the interpolated reference field, using their inferred orientation maps.
A merge mask is determined for each auxiliary field to mask off pixels which should not be used in the final enhanced still image; the masked off pixels generally correspond to regions where the motion maps fail to correctly model the relationship between the reference and auxiliary fields; such regions might involve uncovered background, for example.
A weighted average is formed from the reference field pixels and the motion-compensated auxiliary field pixels which have not been masked off. The weights associated with this weighted averaging operation are spatially varying and depend upon both the merge masks and the displacements recorded in the motion maps. Unlike conventional field averaging techniques, this approach does not destroy available picture resolution in the process of removing aliasing artifacts.
The final still image is obtained after horizontal interpolation by an additional factor of two (to obtain the correct aspect ratio after the fourfold vertical interpolation described above) and an optional post-processing operation which sharpens the image formed from the weighted averaging process described above. The above processing steps are modified somewhat for the chrominance components to reflect the fact that these components have much less spatial frequency content than the luminance component.
An important property of this image enhancement system is that it can work with any number of video fields at all. If only one field is supplied, the system employs the sophisticated directional interpolation technique mentioned above. If additional fields are available, they are directionally interpolated and merged into the interpolated reference field so as to progressively enhance the spatial frequency content, while reducing noise and other artifacts. In the special case where two fields are available, the system may also be understood as a xe2x80x9cde-interlacingxe2x80x9d tool.
Other advantages of this invention will become apparent from the following description taken in conjunction with the accompanying drawings which set forth, by way of illustration and example, certain embodiments of this invention. The drawings constitute a part of this specification and include exemplary embodiments, objects and features of the present invention.