Due to network bandwidth limitations and limitations on server storage space, much of the image/video available on the World Wide Web and other networks exists in low-quality versions degraded from the original image/video. The most common degradations are down-sampling and compression. Down-sampling reduces spatial resolution, for example, converting 640×480 images to 320×240 images by eliminating pixel values in the images. Generally, compression selectively removes less important details and more efficiently represents redundant parts of the image/video. Down-sampling and compression can greatly lower the required bandwidth and storage space for image/video, making accessibility of the image/video practical and convenient. But those benefits are obtained at the expense of the perceptual experience of users, as degradation typically leads to noticeable quality loss in the form of defects in the resulting image/video, such as blurring, blocking and ringing.
For playback of digital image/video content, high-resolution displays have become more common for personal computers, televisions, and even mobile computing devices. On a high-resolution display, down-sampled, compressed image/video content can be reconstructed and shown in a small area that matches the down-sampled resolution. Often, however, the reconstructed image/video is up-sampled for presentation in a larger area of the high-resolution display. Unfortunately, when the size of the image/video is increased for presentation on a high-resolution display, defects introduced by down-sampling and compression may become more noticeable.
To understand why this is the case, it helps to understand how video and image information is represented and processed in a computer. A computer processes media information as a series of numbers representing that information. For example, a single number may represent the intensity of brightness or the intensity of a color component such as red, green or blue for each elementary small region of a picture, so that the digital representation of the picture consists of one or more arrays of such numbers. Each such number may be referred to as a sample or pixel value. For a color image, it is conventional to use more than one sample to represent the color of each elemental region, and typically three samples are used. The set of these samples for an elemental region may be referred to as a pixel, where the word “pixel” is a contraction of “picture element.” For example, one pixel may be comprised of three pixel values that represent the intensity of red, green and blue light needed to represent the elemental region.
When image/video is up-sampled, new pixel values are created by interpolating between existing pixel values. There are various linear up-sampling techniques for image and video interpolation, such as nearest neighbor, bi-linear, bi-cubic, Lanczos and Sinc interpolation. Bi-linear interpolation, which essentially involves averaging of pixel values to find a new, intermediate pixel value, is computationally simple but tends to result in blurring. Other forms of interpolation, such as bi-cubic interpolation, are more computationally complex and preserve more edge details, but also tend to introduce visible artifacts. In part, this is because bi-linear and bi-cubic interpolation schemes typically make the simplifying assumption that image/video content is spatially invariant (i.e., does not change in nature within the image/video content), so the weights applied during interpolation are pre-defined for any areas in images. Such interpolation fails to account for fast-changing characteristics and local characteristics of textures, edges and object contours within image/video content, which often results in interpolated, higher resolution images that have blurred or thicker texture, edges and object contours.
Some other forms of interpolation adapt the weights applied during interpolation depending on the local content of the image/video being filtered. Previous forms of interpolation, however, fail to yield reliable weights to be applied during interpolation for certain types of image/video content, which can result in visible distortion in areas of texture, edges and object contours.