Because of the massive amounts of data inherent in digital video, the transmission of full-motion, high-definition digital video signals is a significant problem in the development of high-definition television. More particularly, each digital image frame is a still image formed from an array of pixels according to the display resolution of a particular system. As a result, the amounts of raw digital information included in high-resolution video sequences are massive. In order to reduce the amount of data that must be sent, compression schemes are used to compress the data. Various video compression standards or processes have been established, including, MPEG-2, MPEG-4, and H.264.
Many applications are enabled where video is available at various resolutions and/or qualities in one stream. Methods to accomplish this are loosely referred to as scalability techniques. There are three axes on which one can deploy scalability. The first is scalability on the time axis, often referred to as temporal scalability. Secondly, there is scalability on the quality axis, often referred to as signal-to-noise scalability or fine-grain scalability. The third axis is the resolution axis (number of pixels in image) often referred to as spatial scalability or layered coding. In layered coding, the bitstream is divided into two or more bitstreams, or layers. Each layer can be combined to form a single high quality signal. For example, the base layer may provide a lower quality video signal, while the enhancement layer provides additional information that can enhance the base layer image.
In particular, spatial scalability can provide compatibility between different video standards or decoder capabilities. With spatial scalability, the base layer video may have a lower resolution than the input video sequence, in which case the enhancement layer carries information which can restore the resolution of the base layer to the input sequence level.
FIG. 1 illustrates a known layered video encoder 100. The depicted encoding system 100 accomplishes layer compression, whereby a portion of the channel is used for providing a low resolution base layer and the remaining portion is used for transmitting edge enhancement information, whereby the two signals may be recombined to bring the system up to high-resolution. The high resolution video input Hi-RES is split by splitter 102 whereby the data is sent to a low pass filter 104 and a subtraction circuit 106. The low pass filter 104 reduces the resolution of the video data, which is then fed to a base encoder 108. In general, low pass filters and encoders are well known in the art and are not described in detail herein for purposes of simplicity. The encoder 108 produces a lower resolution base stream which is provided to a second splitter 110 from where it is output from the system 100. The base stream can be broadcast, received and via a decoder, displayed as is, although the base stream does not provide a resolution which would be considered as high-definition.
The other output of the splitter 110 is fed to a decoder 112 within the system 100. From there, the decoded signal is fed into an interpolate and upsample circuit 114. In general, the interpolate and upsample circuit 114 reconstructs the filtered out resolution from the decoded video stream and provides a video data stream having the same resolution as the high-resolution input. However, because of the filtering and the losses resulting from the encoding and decoding, certain errors are present in the reconstructed stream. These errors are determined in the subtraction circuit 106 by subtracting the reconstructed high-resolution stream from the original, unmodified high-resolution stream. The output of the subtraction circuit 106 is fed to an enhancement encoder 116 which outputs a reasonable quality enhancement stream.
The disadvantage of filtering and downscaling the input video to a lower resolution and then compressing it is that the video loses sharpness. This can to a certain degree be compensated for by using sharpness enhancement after the decoder. Picture enhancement techniques normally are controlled by analyzing the enhance output signal. If the original full resolution signal is used as a reference, the enhancement control can be improved. However, normally such a reference is not present for example in television sets. However, in some application, e.g., spatial scalable compression, such a reference signal is present. The problem, however, becomes how to make use of this reference. One possibility is to look to the pixel difference of the reference and the enhanced output signal. Control can be achieved by minimizing the difference energy. However, this method does not really take into account how the human eye perceives a picture as sharp. It is known that picture content parameters from a picture can be extracted which take into account how the human eye perceives a picture as sharp. Here the control algorithm tries to maximize these values, with the danger of overdoing it, resulting in sharp but not quite natural pictures. The problem is how to use these extracted picture content parameters when there is also a reference picture available to control picture enhancement.