In computer graphics, tone mapping changes the dynamic range of images. For example, tone mapping can change a high dynamic range (HDR) image to an image with a low dynamic range (LDR), or vice versa. In images, the dynamic range is determined by the number of bits (bit-depth) allocated to store pixel intensity values. Tone mapping attempts to avoid strong contrast reduction from scene radiance values to a renderable range while preserving image appearance.
Conventional video coding schemes, such as the MPEG, and ITU series of video coding standards, are well suited for the compression of videos with a fixed bit-depth, e.g., 8 bits per pixel (bpp). Consumer videos available on VHS and DVD, and digital television broadcasts are typically 8 bpp, and are referred to as having a low dynamic range (LDR). Videos with higher bit-depth, e.g., 10 to 24 bpp, are typically used for professional applications, and have a high dynamic range (HDR).
FIG. 1 shows a conventional encoder 100 with motion estimation 110. Input to the encoder is a fixed bit-depth sequence of images or video 101. Frames (images) in the video are partitioned into blocks, e.g., 8×8 or 16×16 pixels. Blocks are processed one at a time. A motion estimator 110 determines a best matching block of a reference frame stored in a frame memory 111 for a current block to be encoded. This best matching block serves as a prediction frame for the current block. A corresponding motion vector 112 is entropy encoded 150. A difference signal 122 between the current block of the input video and a predicted block 121 is determined 120, which is generated by a motion-compensated predictor 130. The difference signal then undergoes a transform/quantization process 140 to yield a set of quantized transform coefficients (texture) 141. These coefficients are entropy encoded 150 to yield a compressed output bitstream 109. Performing an inverse transform/quantization 160 on the quantized transform coefficients 121 and adding 170 the result to the motion compensated prediction 121 generates the reconstructed reference frame 161, which is stored in the frame memory 111 and used for predicting 130 of successive frames of the input video 101. The output encoded bitstream 109 is generated based on the entropy encoding 150 of motion vectors 112 and texture (DC coefficients) 141.
FIG. 2 shows a conventional decoder 200. An input encoded bitstream 201 is subject to an entropy decoder 210 that yields both quantized transform coefficients 211 as well as corresponding motion vectors 212. The motion vectors are used by a motion compensated predictor 220 to yield a prediction signal 221. The quantized transform coefficients 211 are inverse transform/quantized 230 and added 240 to the prediction signal 221 to yield a reconstructed fixed (single) bit-depth video 209. Frames of the reconstructed video, which are used for decoding successive frames, are stored to a frame memory 250. The combination of the encoder and decoder is known as a codec.
The above scheme achieves excellent compression efficiency when the input images have a fixed bit-depth. Currently, most consumer display can only render LDR 8 bpp videos. Therefore, conventional coding schemes can be applied directly.
To view videos with higher bit-depths, HDR display devices are required. Advances in display technology are making it possible for consumers to enjoy the benefits of HDR videos in the near future. To efficiently support both LDR and HDR display devices, a scalable representation of the video, which enables reconstruction for both or either of the LDR video and the HDR video, is required.
One method achieves a scalable representation by compressing the input HDR video and an LDR version in two separate passes, i.e., using a fixed bit-depth HDR encoder and a fixed bit-depth LDR encoder, respectively. This is referred to as simulcast coding. However, the compression efficiency of that method is very low due to the redundancy of the HDR and LDR versions. Also, the computational complexity is very high. A bit-depth scalable video compression scheme is described by Winken et al. in “SVC bit-depth scalability,” Joint Video Team of ISO/IEC MPEG & ITU-T VCEG, Doc. JVT-V078, 22nd Meeting, January 2007.
FIG. 3 shows a bit-depth scaleable encoder 300. An input HDR video 301 is down-converted to an LDR video 101 using tone mapping 310. Then, the LDR video 101 is compressed to produce a base layer 109. Each current reconstructed frame 115 from the base layer is up-converted to the bit-depth of the input video 301 using inverse tone mapping 320 to produce an inverse tone mapped frame 321. The difference between the inverse tone mapped frame 321 and the input HDR frame 301 is determined 329, and the difference signal then undergoes a transform/quantization process 330 to yield a set of quantized transform coefficients 331. These coefficients are entropy encoded 340 to yield an enhancement layer 341. The enhancement layer bitstream 341 is multiplexed 350 with the base layer bitstream 109 to generate the output bit-depth scalable bitstream 309.
FIG. 4 shows the corresponding decoder 400. An input encoded bitstream 401 is demultiplexed 410 into a base layer 201 and an enhancement layer 402. The base layer is decoded as described above. The enhancement layer is also entropy decoded 210 and inverse transform/quantized to produce output 431. In this case, the output of the frame memory 250 is inverse tone mapped 420, and the output of the tone mapping is added to the output 431 to produce a reconstruction 409 of the input video 101.
In a prior art bit-depth scalable video codec, three methods for inverse tone mapping 320 are known, including: linear scaling, linear interpolation, and look-up table mapping. All of those methods apply the same inverse tone mapping to all of the frames in the entire video, which would not perform well when the LDR video is generated by localized or region-based tone mapping 310 methods from the HDR video.
In fact, localized tone mapping methods are used in many applications with regions of interest (ROI). Furthermore, the linear scaling and linear interpolation methods are relatively coarse, which result in poor inverse tone mapping quality, even for globally tone mapped LDR video. The look-up table mapping achieves better inverse tone mapping results, but requires an initial training to build a mapping of pixel intensity values by examining an entire video sequence. This process is very complex and results in considerable initial delay before decoding and display, and would not be suitable for many real-time applications. Also, that method does not perform well for many ROI applications.
Another inverse tone mapping method is described by Segall and Su, in “System for bit-depth scalable coding,” Joint Video Team of ISO/IEC MPEG & ITU-T VCEG, Doc. JVT-W113, April 2007. In that method, two scale factors are used, one for luminance and the other for chrominance components. The scaling factors are assigned to each block to perform the inverse tone mapping. Thus, that method is more suitable for ROI applications. As a major disadvantage, the scale factors are predefined as a set {0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5}, where the set of scale factor is suitable for input video at a particular bit-depth. Hence, the method loses the flexibility of compressing HDR videos with various bit-depths. Another disadvantage is that the identical scaling factor is used for all chrominance components. This can degrade the inverse tone mapping quality.
Clearly, it is desirable to have an inverse tone mapping that fits into a bit-depth scalable video compression scheme and overcomes the disadvantages of the prior art. Specifically, an inverse tone mapping technique is needed that yields high quality, is compatible with a wide range of tone mapping techniques, and does not incur substantial coding overhead.