We have previously developed an example-based super resolution method and apparatus for video data pruning (hereinafter referred to as the “example-based super resolution method”). Example-based super resolution is a super resolution technique that converts a low-resolution picture into a high resolution picture by finding high resolution patches in a patch library using low resolution patches in the input picture as query keywords, and replacing low resolution patches in the low resolution input picture with the retrieved high resolution patches.
In further detail, in this example-based super resolution method, high-resolution video frames at the encoder side are divided into images patches or blocks (for example, in one implementation of the example-based super resolution method, we use 16×16 pixel blocks). The image patches are then grouped into a number of clusters. The representative patches of the clusters are sent to the decoder side along with downsized frames. At the decoder side, the representative patches are extracted. The patches in the low-resolution videos are replaced by the high-resolution representative patches to create a recovered high-resolution video.
Turning to FIG. 1, a high level block diagram of an example-based super resolution system/method is indicated generally by the reference numeral 100. High resolution (HR) frames are input and subjected to encoder side pre-processing at step 110 (by an encoder side pre-processor 151) in order to obtain down-sized frames and patch frames. The down-sized frames and patch frames are encoded (by an encoder 152) at step 115. The encoded down-sized frames and patch frames are decoded (by a decoder 153) at step 120. The low down-sized frames and patch frames are subjected to super resolution post-processing (by a super resolution post-processor 154) in order to provide high resolution output frames at step 125.
Turning to FIG. 2, a high level block diagram of the encoder side pre-processing corresponding to the example-based super resolution system/method of FIG. 1 is indicated generally by the reference numeral 200. Input video is subjected to patch extraction and clustering at step 210 (by a patch extractor and clusterer 251) to obtain clustered patches. Moreover, the input video is also subjected to downsizing at step 215 (by a downsizer 252) to output downsized frames there from. Clustered patches are packed into patch frames at step 220 (by a patch packer 252) to output the (packed) patch frames there from.
Turning to FIG. 3, a high level block diagram of the decoder side post-processing corresponding to the example-based super resolution system/method of FIG. 1 is indicated generally by the reference numeral 300. Decoded patch frames are subject to patch extraction and processing at step 310 (by a patch extractor and clusterer 351) to obtain processed patches. The processed patches are stored at step 315 (by a patch library 352). Decoded down-sized frames are subject to upsizing at step 320 (by an upsizer 353) to obtain upsized frames. The upsized frames are subject to patch searching and replacement at step 325 (by a patch searcher and replacer 354) to obtain replacement patches. The replacement patches are subject to post-processing at step 330 (by a post-processor 355) to obtain high resolution frames.
The key components of the example-based super resolution system/method relating to FIGS. 1-3 are patch clustering and patch replacement. The process has some commonalities with vector quantization based compression. When the system is applied to videos with static scenes, the videos can be very well recovered. However, if the input videos have motion, then jittering artifacts can be observed in the recovered videos. The artifacts are caused by the patch clustering and patch replacement processes. Turning to FIG. 4, quantization error caused by motion is indicated generally by the reference numeral 400. The quantization error an object (in motion) captured in six frames (designated as Frame 1 through Frame 6). The object (in motion) is indicated by the curved line in FIG. 4. The quantization error 400 is shown with respect to an upper portion, a middle portion, and a lower portion of FIG. 4. At the upper portion, co-located input patches 410 from consecutive frames of an input video sequence are shown. At the middle portion, representative patches 420 corresponding to clusters are shown. In particular, the middle portion shows a representative patch 421 of cluster 1, and a representative patch 422 of cluster 2. At the lower portion, patches 430 in the recovered video sequence are shown. The object motion in a video sequence results in a sequence of patches with shifted object edges. Since the patches in a sequence of consecutive frames look very similar, they are grouped into one cluster (or some other low number of clusters) and represented as a single representative patch (or some other low number of representative patches). We use the term “low” in the preceding sentence, since the number of clusters should clearly be less than the number of consecutive frames in a video sequence to be processed. During the recovery process, the corresponding low-resolution patches are replaced with the representative patches associated with the cluster. Since the patches with different spatial shifts are replaced with the same patch, the edges of the objects in the recovered video jump across frames, resulting in jittering artifacts.
We note that in addition to our aforementioned example-based super resolution method and apparatus, other example-based super resolution approaches also exist. Thus, regarding other example-based super resolution approaches, we note that the artifact problem of the patch-replacement process has not been addressed. One reason could be that example-based super resolution algorithm in accordance with a first prior art approach was developed for images rather than videos. Furthermore, since the system corresponding to the aforementioned first prior art approach as well as similar systems were developed for super resolution rather than compression, they do not have the clustering component, therefore the artifact problem of their systems may not be as serious as the example-based super resolution method for video data pruning described above with respect to FIGS. 1-3.
In sum, example-based super resolution for data pruning sends high-resolution (also referred to herein as “high-res”) example patches and low-resolution (also referred to herein as “low-res”) frames to the decoder. The decoder recovers the high-resolution frames by replacing the low-resolution patches with the example high-resolution patches (see FIG. 3). However, for videos with motion, the patch replacement process often results in jittering artifacts due to vector quantization (VQ) errors.