Image downscaling is a fundamental operation performed constantly in digital imaging. The abundance of high resolution capture devices and the variety of displays with different resolutions make it an essential component of virtually any application involving images or video. However, this problem has so far received substantially less attention than other sampling alterations.
Classical downscaling algorithms aim at minimizing aliasing artifacts by linearly filtering the image via convolution with a kernel before subsampling and subsequent reconstruction, following the sampling theorem [Shannon 1998]. However, along with aliasing, these strategies also smooth out some of the perceptually important details and features since the kernels used are agnostic to the image content.
A solution to this problem is adapting the kernel shapes to local image patches [Kopf et al. 2013] in the spirit of bilateral filtering [Tomasi and Manduchi 1998], so that they are better aligned with the local image features to be preserved. This strategy can significantly increase the crispness of the features while avoiding ringing artifacts typical for post-sharpening filters. However, it still cannot capture all perceptually relevant details, and as a result, might distort some of the perceptually important features and the overall look of the input image or lead to artifacts such as jagged edges [Kopf et al. 2013].
Loss of some of the perceptually important features and details stems from the common shortcoming of these methods that they operate with simple error metrics that are known to correlate poorly with human perception [Wang and Bovik 2009]. Significant improvements have been obtained for many problems in image processing by replacing these classical metrics with perceptually based image quality metrics [Zhang et al. 2012; He et al. 2014].
The standard approach to image downscaling involves limiting the spectral bandwidth of the input high resolution image by applying a low-pass filter, subsampling, and reconstructing the result. As is well-known in signal processing, this avoids aliasing in the frequency domain and can be considered optimal if only smooth image features are desired. Approximations of the theoretically optimum sinc filter, such as the Lanczos filter, or filters that avoid ringing artifacts such as the bicubic filter are typically used in practice [Mitchell and Netravali 1988]. However, these filters often result in oversmoothed images as the filtering kernels do not adapt to the image content. The same is true for more recent image interpolation techniques [Thevenaz et al. 2000; Nehab and Hoppe 2011].
Recently, Kopf et al. [2013] showed that significantly better downscaling results with crisper details can be obtained by adapting the shapes of these kernels to the local input image content. Since the kernels better align with the features in the input image, they capture small scale details when present. However, the method does not take perceptual importance of the features into account, resulting in loss of apparent details and hence leading to a rather abstract view of the input image. Indeed, the method is shown to provide excellent results for generating pixel-art images [Kopf et al. 2013].
Improvements in this image processing are desirable, in that they can reduce the amount of computing effort needed to obtain pleasing downscaled images.