High Dynamic Range (HDR) imaging has become an increasingly hot topic within the TV and multimedia industry in the last couple of years. While screens capable of displaying the HDR video signal are emerging at the consumer market, over-the-top (OTT) players such as Netflix have announced that HDR content will be delivered to the end-user. Standardization bodies are working on specifying the requirements for HDR. For instance, in the roadmap for Digital Video Broadcasting (DVB), Ultra High Definition Television 1 (UHDTV1) phase 2 will include HDR support. Moving Pictures Expert Group (MPEG) is currently working on exploring how HDR video could be compressed.
HDR imaging is a set of techniques within photography that allows for a greater dynamic range of luminosity compared to standard digital imaging. Dynamic range in digital cameras is typically measured in f-stops, where 1 f-stop is a doubling of the amount of light. A standard Liquid Crystal Display (LCD) High Definition Television (HDTV) using Standard Dynamic Range (SDR) can display less than or equal to 10 stops. HDR is defined by Moving Pictures Expert Group (MPEG) to have a dynamic range of over 16 f-stops.
High Efficiency Video Coding (HEVC) is a block based video codec standardized by International Telecommunication Union's Telecommunication Standardization Sector (ITU-T) and MPEG, which utilizes both temporal and spatial prediction. Spatial prediction is achieved using intra (I) prediction from within the current frame. Temporal prediction is achieved using inter (P) or bi-directional inter (B) prediction on block level from previously decoded reference pictures. The difference between the original pixel data and the predicted pixel data, referred to as the residual, is transformed into the frequency domain and quantized before being transmitted together with necessary prediction parameters such as mode selections and motion vectors. By quantizing the transformed residuals, the tradeoff between bitrate and quality of the video may be controlled. The level of quantization is determined by the quantization parameter (QP).
Scalable High Efficiency Video Coding (SHVC) is an extension to HEVC that supports temporal, spatial, SNR, bitdepth, color gamut and hybrid codec scalability. SHVC is useful when coding two or more versions of the same content at different qualities. The bitdepth scaling in SHVC is made using an 8×8 filter where a transmitted phase parameter is used to select the filter parameters.
A picture element (pixel for short) is the smallest element of a digital image and holds the luminance and color information of that element.
The luminance and color can be expressed in different ways depending on the use case. Displays usually have three color elements—red, green and blue, which are lit at different intensities depending on what color and luminance is to be displayed. It becomes therefore convenient to send the pixel information in RGB pixel format to the display, i.e. using three components, R, G and B. Since the signal is digital the intensity of each component of the pixel must be represented with a fixed number of bits, referred to as the bitdepth of the component. For instance, an RGB pixel format with 8 bits per color component can be written RGB888. A bitdepth of n can represent 2n different values, e.g. 256 values per component for 8 bits and 1024 values per component for 10 bits.
When video needs to be compressed it is convenient to express the luminance and color information of the pixel with one luminance component and two color components. This is done since the human visual system (HVS) is more sensitive to luminance than to color, meaning that luminance can be represented with higher accuracy than color. This pixel format is often referred to Y′UV where Y′ stands for luminance and U and V stands for the two color components. The conversion between RGB and Y′UV for HDTV is to be made using the following matrix multiplications defined in BT.709:
      [                                        Y            ′                                                U                                      V                      ]    =                              [                                                    0.2126                                            0.7152                                            0.0722                                                                                      -                  0.09991                                                                              -                  0.33609                                                            0.436                                                                    0.615                                                              -                  0.55861                                                                              -                  0.05639                                                              ]                ⁡                  [                                                    R                                                                    G                                                                    B                                              ]                    ⁢                          [                                    R                                                G                                                B                              ]        =                  [                                            1                                      0                                      1.28033                                                          1                                                      -                0.21482                                                                    -                0.38059                                                                        1                                      2.12798                                      0                                      ]            ⁡              [                                                            Y                ′                                                                        U                                                          V                                      ]            
Fourcc.org holds a list of defined YUV and RGB formats. The most commonly used pixel format for standardized video codecs (e.g. H.264, MPEG-4, HEVC) is YUV420 (aka YV12) planar where the U and V color components are subsampled in both vertical and horizontal direction and the Y, U and V components are stored in separate chunks for each frame. Thus for a bitdepth of 8 bits per component, the number of bits per pixel (bpp) is 12, where 8 bits represents the luminance and 4 bits the two color components.
When a pixel consists of several channels, or components, e.g. a luminance channel and two color channels, the information in only one of these channels for a pixel is sometimes referred to as a sub-pixel.
Color banding, or banding for short, is an artifact that may be visible in gradient areas if the bitdepth is not sufficiently high. Color banding is characterized by abrupt changes between shades of the same colors. For instance, natural gradients like sunsets, dawns or clear blue skies may show some banding effects even at 8 bits per channel. An example of color banding artifacts can be seen in FIG. 1a, where the number of bits per pixels or, equivalently, grey levels, varies from 8 bpp (256 grey levels) to 5 bpp (32 gray levels). At the very bottom of FIG. 1a, a banding artifact is represented using only black and white for the sake of reproducibility. A row of pixels is shown. A first band 1x,y or banding effect, is represented by black. A second band 2 is represented by a first pattern. A third band 3 is represented by a second pattern. Finally, a fourth band 4 is represented by white. In the example at the bottom, the bands are much wider than in the other examples. In FIG. 1b, a bitdepth with four levels is illustrated, i.e. two bits are used for representing intensity of pixels. In FIG. 1c, a further bitdepth with eight levels is illustrated, i.e. three bits are used for representing intensity of pixels.
The Human Visual System (HVS) is not equally sensitive to all colors. For instance, it is more sensitive to green and red than it is to blue. Color banding is therefore less perceptible in blue areas than in green areas.
Small differences in absolute pixel values are usually difficult to detect due to how the HVS interprets what we see. However, in gradient areas banding artifacts may be visible if the gray levels have been sampled too sparsely, e.g. if a too low bitdepth is used.
To some extent, color banding can be reduced by introducing dithering in the downsampling step, a method that creates an illusion of higher bitdepth by diffusion, e.g. creating a pattern, of available values from the color palette. Dithering requires that the image is seen sufficiently far away for the illusion to have effect. When looked upon close-up dithering will instead display grain artifacts. Dithering is commonly used when producing posters and for displaying images where the bitdepth of the printer or display is less than that of the image. The problem with dithering is that the perceived resolution decreases. When looking at the picture close-up dithering artifacts in the form of patterns or grain may be visible.
Another way of reducing/removing color banding is to make sure to have enough bitdepth to start with. For HDR video, where the dynamic range is significantly higher compared to SDR, a bitdepth of at least 11-12 bits is needed to guarantee that no color banding artifacts will be visible.
Video captured with a bitdepth of 8 bits per component could be upsampled to a higher bitdepth for display at a 10 bit screen. However, simply upsampling the bitdepth by shifting pixel by pixel will keep the problem of banding.
The bitdepth upsampling in SHVC uses a local 8×8 filter and requires parameters to be transmitted to determine the filter parameters. Local filters may reduce the transitions of a shallow gradient but may not fully reproduce the shallow gradient. Moreover, pixels in non-gradient areas are also affected which may reduce the sharpness of the overall picture.