The human visual system can perceive luminance ranges of about 8 orders of magnitude and about 5 orders simultaneously when adapted to a certain light condition [1]. In contrast, until a few years ago, the dynamic range of most video capture and display devices was limited to about two orders of magnitude. Today, with the introduction of commercial HDR displays, a dramatically increased realism can be expected when the full visible light range is transmitted via HDR video [1]. In order to allow for a seamless transition from LDR to HDR video, backwards compatibility of such a future HDR coding standard would be extremely useful to allow playback on legacy devices. So far, in the literature there exist only a few approaches to backwards compatible HDR video coding [2-6]. Whereas the approach in [2] is based on an implementation of the advanced simple profile of an MPEG-4 codec with a bit depth of 8 bit, [3-6] are extensions of the scalable video coding profile of H.264/AVC (aka. as SVC). SVC allows for bit depths of more than 8 bit.
In all cases, before encoding, the LDR video sequence has to be generated from the original HDR video data via tone-mapping. Tone-mapping operators (TMO) can operate on the whole image globally (global methods), locally (local methods), or both. A comprehensive introduction to the most important TMO is given in the textbook [1]. The process that reconstructs an HDR video sequence from an LDR video can be denoted as inverse tone-mapping (ITMO) or advantageously inter-layer prediction (ILP) when it is used for scalable video coding [3]. In this context the ILP has the task of reducing the redundancy between LDR and HDR layers to reduce the useful bit rate for transmitting the residual information. In a coding scenario, the ILP should work agnostic w.r.t. to the chosen TMO to be generally efficient. For example, in [2] and [4] the authors propose to use a simple mapping function to globally scale each LDR frame or even the whole LDR sequence to the dynamic range of the HDR sequence. However, the efficiency of this predictor is low whenever the LDR video was generated by a locally adaptive TMO (which usually produces more attractive LDR videos).
The approaches in [3-6] exhibit some local adaptivity, by using a block-wise ILP, however they are operating in a color space that is not suited for transmitting HDR data. Furthermore, they have only limited capabilities for ILP parameter estimation and the useful side information is inefficiently coded.