In the past, due to the limitations of the camera and monitor systems, most videos were captured in low dynamic ranges as relative to the human perceptual sensitivity and encoded as standard dynamic range (SDR) videos. However, the rapid development of the high dynamic range (HDR) display technology has driven the needs of HDR content. Typical HDR video formats include SMPTE-2084, Dolby Vision, HLG, and HDR10+. Various image and video conversion methods have been developed to convert the mass existing SDR videos to HDR videos. For example, U.S. Pat. Nos. 8,948,537, 8,824,829, 8,582,913, and 8,233,738 disclose various methods for enhancing the input low dynamic range image to produce image data that have higher dynamic range in a real-time implementation. U.S. Pat. No. 8,265,378 discloses how to convert and represent image data from lower bit depth to higher bit depth for rendering HDR image data that are typically coded in 10 bits to 12 bits, instead of 8 bits for SDR image data. U.S. Pat. No. 8,050,512 discloses a conversion performed during the displaying process, where the conversion does not depend on other images. In U.S. Pat. No. 7,573,533, an adaptive contrast enhancement method by generating the transfer curves is proposed.
Human eyes are highly adaptive to a wide range of luminance levels. Human visual perception adjusts automatically according to the target display for comfortable viewing experience. It is essential to utilize the maximum dynamic range of the target display without losing details, and at the same time present a majority of the content at a luminance level that is most sensitive to human eyes. Thus, the conversion from a SDR video to a HDR video is actually an enhancement of the dynamic range from SDR to HDR. The perceptual responses of human eyes to different dynamic ranges and colors are different. It would be difficult to find a universal mapping for all pixels in a video to facilitate a pleasant perceptual viewing experience after the video is converted to HDR. Obviously, using a static conversion, in which a universal mapping function is used for the whole video without taking the spatial and temporal characteristics of the video into account, is not the optimal way in most cases. For instance, a static conversion may result in overly bright HDR images from some bright SDR images or overly dark HDR images from some dark SDR images. An adaptive conversion based on the spatial statistics of the video may do a better job. However, the adaptive conversion with only spatial information may result in loss of continuity of luminance changes from frame to frame due to the different spatial statistics of individual frames. Except for screen change, such adaptive conversion may introduce flickering effect.