In “Space-Time Super-Resolution from a Single Video” by O. Shahar, A. Faktor and M. Irani (IEEE Conf. on Computer Vision and Pattern Recognition, 2011) [1], a space-time pyramid of the input video sequence is created containing several versions of the input video sequence at different spatial and temporal scales. Then, for each spatio-temporal video patch (with spatial dimensions of 5×5 pixels and temporal dimensions of 3 frames), a set of best matches is searched for across the pyramid. This operation is sped-up by means of a randomized-based search, which is highly costly. Then, classical reconstruction-based SR techniques are used to generate the super-resolved video patches, which, once put together, result in the super-resolved video sequence. Even though the method works impressively in the provided results, it is not clear that it would work properly in general sequences, with motions of different nature not recursively appearing at different spatial and temporal scales in the input video sequence. Furthermore, the spatio-temporal search, even if not exhaustive, is a costly procedure which renders the approach unusable for real-time applications.
In “On improving space-time super resolution using a small set of video inputs” by U. Mudenagudi, S. Banerjee and P. Kalra (Indian Conf. on Computer Vision, Graphics and Image Processing, 2008) [2], a method for generating a super-resolved version of a sequence for which several versions exist at various spatio-temporal shifts is presented, which uses graph-cuts to solve a MRF-MAP (Markov Random Field-Maximum A Posteriori) model of the classical reconstruction-based super-resolution equation. This method requires the existence of several versions of the same video sequence at different spatio-temporal shifts, which is something that does not occur in most of the available recorded material.
In “Spatio-temporal resolution enhancement of video sequence based in super-resolution reconstruction”by M. Haseyama, D. Izumi and M. Takizawa (ICASSP 2010) [3], a method for generating joint frame-rate up-conversion and up-scaling is presented, which is based on the classical reconstruction-based super-resolution model. Whereas the authors claim the proposed method is capable of obtaining temporal super-resolution, the equation describing such behavior indicates what is obtained is a smooth linear interpolation of the closest spatially super-resolved frames, which under general motion will produce incorrectly interpolated frames.
In “Super-Resolution Without Explicit Subpixel Motion Estimation” [4] by H. Takeda, P. Milanfar, M. Protter and M. Elad (IEEE Trans. on Image Processing, vol. 18, no. 9, 2009), spatio-temporal super-resolution of video sequences is achieved by using space-time steering filters in local regions, after having aligned matching patches by means of block matching. Two problems of this approach are that, first, the effect of the space-time steering filter is that of producing a non-linear over-smoothing of the region to be super-resolved, which requires a costly non-linear post-correction and loss of detail, and second that the approach is only capable of producing correctly interpolated frames under a limited subset of motion ranges, due to the mechanism for motion compensation (block matching).