The present application is concerned with video coding such as for use with HDR sequences.
So far, most image and video coding applications can cover only a luminance range of about 2 orders of magnitude (low dynamic range (LDR)) [1]. However, the human visual system (HVS) allows us to adapt to light conditions that can cover a range of more than ten orders of magnitude and to perceive about five orders of magnitude simultaneously [2]. With an increasing number of applications that can profit from a representation of the full HDR luminance (e.g., CGI, special effects productions, HDR displays), there will be an increasing demand in HDR video coding methods. Using a standard coding method, like H.264/AVC, will allow for a seamless transition from LDR towards HDR video coding without much additional effort. Note that the term HDR refers to the representation of real luminance values throughout this work and not to a tone-mapped LDR representation, what is sometimes called HDRI.
Since the most natural representation of HDR data, floating-point numbers, does not result in a good compression and is also costly to handle, several authors proposed a suitable mapping from floating-point luminance values to integer luma values [3, 4, 5, 6]. These luminance-to-luma mappings have in common that the associated loss in precision is below the tolerance of the HVS and no distortion is therefore perceived. They further have in common, that they apply a conversion of the HDR image data to the CIELUV color space [1] before further processing. That is, the data is represented by a luminance component Y and the chromacity components (u′, v′). The advantage of the (u′, v′) color representation is that it is perceptually uniform. That is, equal offsets in this representation represent equal perceptual color differences and therefore they can be linearly mapped to integer values with a bit depth of, e.g, 8 bit. Such a mapping from the perceivable (u′, v′) interval [0, 0.62] to integer values in the range [0, 255] introduces a maximum absolute quantization error of 0.00172 which is well below the visible threshold.
Since the HVS obeys to the Weber-Fechner law, for a large luminance range, in most works a logarithmic mapping of the luminance Y to luma code values is performed [3, 5, 6]. This results in a constant relative quantization error leading to a perceptually uniform representation of the luminance. E.g., in [3] Larson proposed the following luminance-to-luma mapping (Log Luv transform):
                                          L            15                    =                      ⌊                          256              ⁢                              (                                                                            log                      2                                        ⁡                                          (                      Y                      )                                                        +                  64                                )                                      ⌋                          ;                                  ⁢                  Y          =                      2                                                                                L                    15                                    +                  0.5                                256                            -              64                                                          (        1        )            
It maps the real-valued luminances in the interval [5.44×10−20, 1.84×1019] to 15 bit integer luma values in the range [0, 215−1] and vice versa. That is, about 38 orders of luminance magnitude are represented with a relative step size of 0.27%. This is well below the visible quantization threshold of about 1% [1].
However, the dynamic range covered by such a mapping is far beyond the range of what the HVS can simultaneously perceive. Furthermore, there exists no natural image data that spans such high dynamic ranges. Whereas for lossless image compression of data that can undergo further image processing steps this extremely high range and fidelity might be useful, for lossy video encoding that is intended for being watched by human observers, it is not. Consequently, there is no need to reserve bits to represent luminance values that are not perceivable or that do not occur in the source image or video frame. Since this would degrade the compression efficiency, e.g., in HDR still image coding with the TIFF library [3], a scaling factor can be used to scale the source image to an appropriate range before the Log Luv transform. In a similar Log Luv approach [6], scaling has been applied to each individual frame of a video sequence in order to exploit the full range of possible luma code values for a given bit depth.
However, like many HDR video coding methods, the latter is just a straightforward extension of HDR image coding to individual video frames. Therefore, the approach lacks some video specific aspects what significantly degrades the compression efficiency. Most notably, mapping the luminance values of successive frames to different code values with an individual scaling significantly harms the temporal coherence of the sequence. Consequently the temporal motion compensated prediction in the H.264/AVC video coder mostly fails.
Naturally, this is also true for other temporally predicting coders and also for sample values other than luminance values.