Display technologies being developed by Dolby Laboratories, Inc., and others, are able to reproduce images having high dynamic range (HDR). Such displays can reproduce images that more faithfully represent real-world scenes than conventional displays characterized by approximately three orders of magnitude of dynamic range (e.g., standard dynamic range—“SDR”).
Dynamic range (DR) is a range of intensity (e.g., luminance, luma) in an image, e.g., from darkest darks (blacks) to brightest brights (highlights). As used herein, the term ‘dynamic range’ (DR) may relate to a capability of the human visual (or psychovisual) system (HVS) to perceive a range of intensity (e.g., luminance, luma) in an image, e.g., from darkest darks to brightest brights. In this sense, DR relates to a ‘scene-referred’ intensity. DR may also relate to the ability of a display device to adequately or approximately render an intensity range of a particular breadth. In this sense, DR relates to a ‘display-referred’ intensity. Unless a particular sense is explicitly specified to have particular significance at any point in the description herein, it should be inferred that the term may be used in either sense, e.g. interchangeably.
As used herein, the term high dynamic range (HDR) relates to a DR breadth that spans the some 14-15 orders of magnitude of the HVS. For example, well adapted humans with essentially normal vision (e.g., in one or more of a statistical, biometric or ophthalmological sense) have an intensity range that spans about 15 orders of magnitude. Adapted humans may perceive dim light sources of as few as a mere handful of photons. Yet, these same humans may perceive the near painfully brilliant intensity of the noonday sun in desert, sea or snow (or even glance into the sun, however briefly to prevent damage). This span though is available to ‘adapted’ humans, e.g., those people whose HVS has a time period in which to reset and adjust.
In contrast, the DR over which a human may simultaneously perceive an extensive breadth in intensity range may be somewhat truncated, in relation to HDR. As used herein, the terms ‘enhanced dynamic range’ (EDR), ‘visual dynamic range,’ or ‘variable dynamic range’ (VDR) may individually or interchangeably relate to the DR that is simultaneously perceivable by a HVS. As used herein, EDR may relate to a DR that spans 5-6 orders of magnitude. In the present application, VDR and EDR are intended to indicate any extended dynamic range which is wider than SDR and narrower or equal to HDR.
To support backwards compatibility with existing 8-bit video codecs, such as those described in the ISO/IEC MPEG-2 and MPEG-4 specifications, as well as new HDR display technologies, multiple layers may be used to deliver HDR video data from an upstream device to downstream devices. In one approach, generating an 8-bit base layer version from the captured HDR version may involve applying a global tone mapping operator (TMO) to intensity (e.g., luminance, luma) related pixel values in the HDR content with higher bit depth (e.g., 12 or more bits per color component). In another approach, the 8-bit base layer may be created using an adaptive linear or non-linear quantizer. Given a BL stream, a decoder may apply an inverse TMO or a base layer-to-EDR predictor to derive an approximated EDR stream. To enhance the quality of this approximated EDR stream, one or more enhancement layers may carry residuals representing the difference between the original HDR content and its EDR approximation, as it will be recreated by a decoder using only the base layer.
For the non-backwards compatible (NBC) video codec, input high dynamic range video is partitioned into two or more layers, a base layer (BL) and one or more enhancement layers (ELs), via layer decomposition, which are subsequently compressed for transmission. As used herein, the term “non-backwards compatible video codec” denotes a layered codec wherein the base layer on its own is not adequate to create a usable version of the input signal. That is, a decoder always needs both the BL and the EL layers to fully reconstruct the video signal.
See FIG. 1 for an example NBC encoder system. With multi-layer transmission, there arises an issue as to how to allocate a bit rate to each of the BL and EL layers for the NBC encoder in a way that achieves the best subjective quality subject to total maximum allowed bit rate. In other words, given a fixed transmission bit rate, how much of the bandwidth should be given to the BL and how much to the EL in order to produce the best output video after decoding? Different bit rate distributions (ratios) can produce noticeable differences in the final reconstructed video quality, depending on the features of the input video and on how the layer decomposition is performed in the encoder.
Traditional methods include conducting the rate-distortion analysis in each layer and deriving the optimal solution in terms of mean-squared-error (MSE) or peak signal-to-noise ratio (PSNR). However, these methods have their drawbacks. For example, PSNR does not always correlate to the intended location of visual attention in a scene. A straightforward way to increase the PSNR is to assign more bits to dark areas since more codewords are allocated to dark areas; however, there can be some important brightly lit areas that should also get more bits for better quality. Setting an objective cost function would not achieve better visual quality in this case.
The approaches described in this section are approaches that could be pursued, but not necessarily approaches that have been previously conceived or pursued. Therefore, unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, issues identified with respect to one or more approaches should not assume to have been recognized in any prior art on the basis of this section, unless otherwise indicated.