Video spatial up-conversion (V-SUC) is also known as video resolution enhancement. V-SUC is used to enhance the spatial resolution of an arbitrary video sequence through both horizontal and vertical spatial interpolation. Video spatial up-conversion is one aspect of video format conversion (VFC), in which video signals are converted from one format to another. Two typical aspects of VFC are video deinterlacing, also known as video scan rate up-conversion and video picture rate up-conversion. Deinterlacing involves enhancing the spatial resolution of a video signal through interpolation in the vertical direction. Video picture rate up-conversion enhances the picture rate (also known as frame rate) of a video signal through temporal interpolation.
Video spatial up-conversion is required for TV-out of mobile phone captured videos. Typical spatial resolutions of NTSC TV are 640×480 or 800×576. In contrast, videos captured by conventional mobile telephones have a spatial resolution typically as SIF (320×240), CIF (352×288), or QCIF (176×144). Therefore, the spatial resolution needs to be enhanced before mobile telephone-captured videos are displayed in a regular TV device. Another example of video spatial up-conversion involves the display of standard definition TV (SDTV) signals in a high definition TV (HDTV) device.
Video spatial up-conversion mainly needs to fulfill two tasks in the process of spatial resolution enhancement: anti-aliasing and high spatial frequency generation to overcome the over-smoothness artifact.
A digital video signal is obtained through three-dimensional (3D) sampling of the original continuous video signal. For example, Δx, Δy, and T can denote the sampling distances in the horizontal direction, the vertical direction, and the temporal direction, respectively, which specify a 3D sampling grid. In this situation the Fourier spectrum of the digital video signal is the ensemble of multiple replications of the Fourier spectrum of the continuous video signal along the 3D sampling grid that is specified by the sampling frequencies, fsx, fsy, and fst, where fsx=1/(Δx), fsy=1/(Δy), and fst=1/T. The replication centered at the coordinates (0,0,0) is referred to as the baseband spectrum. If the original continuous signal is band-limited and the maximum frequencies in the respective directions, denoted as fmaxx, fmaxy, and fmaxt respectively, satisfy the following constraints, namely fmaxx≦fsx/2=1/(2Δx), fmaxy≦fsy/2=1/(2Δy), and fmaxt≦fst/2=1//(2T), then the continuous signal can be completely recovered from its 3D samples. Ideal interpolation filtering then corresponds to all-pass the baseband spectrum and the other replications are zeroed-out. If the above constraints are violated, then adjacent spectral replications will overlap with each other, resulting in aliasing.
When a continuous video signal is sampled, anti-aliasing filtering is first applied so that all the frequencies that are larger than half of the respective sampling frequency are removed, avoiding the problem of aliasing. However, this is not the case for progressively scanned videos that are captured by cameras. It is known that sampling in both the vertical and temporal directions is part of the scanning format integrated with the camera. The desired anti-aliasing is therefore required in the optical path of the camera, which is extremely difficult and expensive to realize. Therefore, aliasing is usually present in the fy-ft frequency space, as shown in FIG. 1. In the fy-ft frequency space, the extent of the spectrum support is determined by the vertical details of the scene, while the spectrum orientation is determined by the vertical motions.
When a digital video signal is upsampled, an ideal interpolation filter should all-pass the baseband spectrum, without aliasing, while suppressing the aliasing portion as much as possible. As shown in FIG. 1(b), if a vertical motion is present, an ideal low pass filter for interpolation should be motion-compensated to effectively extract the baseband spectrum without aliasing.
In contrast, horizontal sampling is realized after the image acquisition process. For this reason, anti-aliasing filtering can be implemented in the horizontal direction before sampling. This implies that, for video spatial up-conversion, the interpolation in the horizontal direction and the vertical direction should be treated separately. Because the high frequency component is either filtered out in the process of sampling or suppressed due to aliasing in the process of upsampling, the video signal after spatial up-conversion is lacking the high frequency component, resulting in the blurring or over-smoothness of artifacts. Many spatial filters have been designed to boost the high frequency component during spatial interpolation.
Conventional techniques for video spatial up-conversion have been primarily realized through spatial interpolation in a frame-by-frame basis. For this reason, spatial interpolation techniques for 2D still images have been directly extended to the use for video signals, where correlation across different frames of a digital video has been completely ignored.
Spatial interpolation using finite impulse response (FIR) filtering is the most commonly used technique, where image independent FIR filters are applied in both the horizontal direction and vertical direction of a still image. Various interpolation FIR filters have been designed, with typical examples as bilinear filter, bicubic filter, bicubic spline filter, Gaussian filter, and Lanczos filter. These FIR filters are differentiated from each other mainly by different passband and stopband frequencies, as well as the length of the filter kernels. The design of these FIR filters mainly aims to all-pass the baseband spectrum containing no alias, suppress the aliasing spectrum component, and boost high frequencies to preserve image details such as edges. As we mentioned, proper anti-aliasing is usually applied prior to horizontal sampling but not in vertical sampling, it is suggested that different filters be used for horizontal interpolation and for vertical interpolation.
Image content-dependent filters have also been developed for image spatial interpolation. On such filter is referred to as the Wiener filter, which is a linear filter with a target at the least mean square error (MSE). The coefficients of these types of filters are derived from the local image content, thus adapting to the local image characteristics. Other image spatial interpolation techniques are also conventionally known. These techniques include New Edge-Directed Interpolation (NEDI), which uses the geometrical duality across different resolutions of the image content, and Adaptive Quadratic (AQua) image interpolation, which is based upon the optimal recovery theory and can be used to permit the interpolation of images by arbitrary factors. It has been shown that longer FIR filter kernels or image dependent filters are often preferred.
Nevertheless, for the techniques that use spatial interpolation for video spatial up-conversion in a frame-by-frame basis, the correlation along the motion trajectory in the temporal direction has been widely ignored. It is known that NEDI has been extended for the use of video spatial up-conversion by taking into account of motion compensation. However, this consideration of motion compensation is confined to a specific schematic framework. Additionally, motion compensation has been considered for “superresolution,” a recently emerged application also aiming to enhance the spatial resolution of an arbitrary video signal. However, superresolution is considerably different from video spatial up-conversion in the sense that superresolution is targeted to generate one or a limited set of images from a given video sequence with enhanced spatial resolution. In contrast, video spatial up-conversion aims to enhance the spatial resolution of every picture in the video sequence. An effective video spatial up-conversion technique is only permitted to use a limited number of adjacent frames to enhance the resolution of current frame and the computational complexity should be kept reasonably low. Therefore, the concept of motion compensated video spatial up-conversion has not been extensively examined.