High Dynamic Range (HDR) has become an increasingly hot topic within the TV and multimedia industry in the last couple of years. While screens capable of displaying the HDR video signal are emerging at the consumer market, over-the-top (OTT) players such as Netflix have announced that HDR content will be delivered to the end-user. Standardization bodies are working on specifying the requirements for HDR. For instance, in the roadmap for DVB, UHDTV1 phase 2 will include HDR support. MPEG is currently working on exploring how HDR video could be compressed.
HDR imaging is a set of techniques within photography that allows for a greater dynamic range of luminosity compared to standard digital imaging. Dynamic range in digital cameras is typically measured in f-stops, where one f-stop means doubling of the amount of light. A standard LCD HDTV using Standard Dynamic Range (SDR) can display less than or equal to 10 stops. HDR is defined by MPEG to have a dynamic range of over 16 f-stops.
HDR is defined for UHDTV in ITU-R Recommendation BT.2020 while SDR is defined for HDTV in ITU-R Recommendation BT.709.
A color model is a mathematical model that defines the possible colors that can be presented using a predefined number of components. Examples of color models are RGB and CMYK. Adding a specific mapping function between the color model and a reference color space creates within the reference color space a “foot print” referred to as color gamut. Normally the CIELAB or XYZLAB color spaces are used as reference color spaces as they span the range of visible colors for the human visual system.
A picture element (pixel for short) is the smallest element of a digital image and holds the luminance and color information of that element.
The luminance and color can be expressed in different ways depending on the use case. Displays usually have three color elements, red, green and blue which are lit at different intensities depending on what color and luminance is to be displayed. It becomes therefore convenient to send the pixel information in RGB pixel format to the display. Since the signal is digital the intensity of each component of the pixel must be represented with a fixed number of bits, referred to as the bit depth of the component. For instance, an RGB pixel format with 8 bits per color component can be written RGB888. A bit depth of n can represent 2n different values, e.g. 256 values per component for 8 bits and 1024 values per component for 10 bits.
When video needs to be compressed it is convenient to express the luminance and color information of the pixel with one luminance component and two color components. This is done since the human visual system (HVS) is more sensitive to luminance than to color, meaning that luminance can be represented with higher accuracy than color. This pixel format is often referred to Y′UV where Y′ stands for luminance and U and V stands for the two color components. The conversion between RGB and Y′UV for HDTV is to be made using the following matrix multiplications defined in BT.709:
      [                                        Y            ′                                                U                                      V                      ]    =                              [                                                    0.2126                                            0.7152                                            0.0722                                                                                      -                  0.09991                                                                              -                  0.33609                                                            0.436                                                                    0.615                                                              -                  0.55861                                                                              -                  0.05639                                                              ]                ⁡                  [                                                    R                                                                    G                                                                    B                                              ]                    ⁢                          [                                    R                                                G                                                B                              ]        =                  [                                            1                                      0                                      1.28033                                                          1                                                      -                0.21482                                                                    -                0.38059                                                                        1                                      2.12798                                      0                                      ]            ⁡              [                                                            Y                ′                                                                        U                                                          V                                      ]            
Fourcc.org holds a list of defined YUV and RGB formats. The most commonly used pixel format for standardized video codecs (e.g. H.264/AVC, MPEG-4, HEVC) is YUV420 (aka YV12) planar where the U and V color components are subsampled in both vertical and horizontal direction and the Y, U and V components are stored in separate chunks for each frame. Thus for a bit depth of 8 per component, the number of bits per pixel is 12 where 8 bits represents the luminance and 4 bits the two color components.
Video is captured by cameras in the linear domain, meaning that the color values are linearly proportional to the amount of light in candela per square meter (cd/m2). Before encoding the video, the video is typically transferred to a perceptual domain using a transfer function (TF) to minimize the visible errors of the encoding. After decoding, the video is typically converted back to the linear domain using the inverse transfer function. Three transfer functions that have been discussed for HDR in MPEG is the Dolby PQ-EOTF (aka PQTF), the Philips TF and the BBC TF.
HEVC version 2 includes a Color Remapping Information (CRI) Supplemental Enhancement Information (SEI) message which may be used to remap the color components of the decoded samples to a different color space. The intent of this message is to adapt to legacy displays (e.g. converting from ct.2020 using wide color gamut (WCG) to legacy ct.709) while preserving artistic intent of the color gradings from studios and to ensure rendered color fidelity for the end user. Output Code Map (OCM) SEI message provides improvement as compared to the CRI SEI message in a sense that more than 33 pivot values are possible, making it possible to convert from 10 to 12 bits without the need to interpolate the values between the pivots.
High Efficiency Video Coding (HEVC) is a block based video codec standardized by ITU-T and MPEG that utilizes both temporal and spatial prediction. Spatial prediction is achieved using intra (I) prediction within the current frame. Temporal prediction is achieved using inter (P) or bi-directional inter (B) prediction on block level from previously decoded reference pictures. The difference between the original pixel data and the predicted pixel data, referred to as the residual, is transformed into the frequency domain and quantized before transmitted together with necessary prediction parameters such as mode selections and motion vectors. By quantizing the transformed residuals, the tradeoff between bitrate and quality of the video may be controlled.
The level of quantization is determined by the quantization parameter (QP). The quantization parameter (QP) is a key technique to control the quality/bitrate of the residual in video coding. It is applied such that it controls the fidelity of the residual (typically transform coefficients) and thus also controls the amount of coding artifacts. When QP is high the transform coefficients are quantized coarsely resulting in fewer bits but also possibly more coding artifacts than when QP is small where the transform coefficients are quantized finely. A low QP thus generally results in high quality and a high QP results in low quality. In HEVC v1 (similarly also for H.264/AVC) the quantization parameter can be controlled on picture or slice level or block level. On picture and slice level it can be controlled individually for each color component. In HEVC v2 the quantization parameter for chroma can be individually controlled for the chroma components on a block level.