Digital encoding of various source signals has become increasingly important over the last decades as digital signal representation and communication increasingly has replaced analogue representation and communication. Continuous research and development is ongoing in how to improve the quality that can be obtained from encoded images and video sequences while at the same time keeping the data rate to acceptable levels.
An important factor for perceived image quality is the dynamic range that can be reproduced when an image is displayed. However, conventionally, the dynamic range of reproduced images has tended to be substantially reduced in relation to normal vision. In real scenes the dynamic range of different objects in regions of different illumination may easily correspond to a dynamic range of 10.000:1 or more (14 bit linear), where very precise luminance gradations may occur in all luminance levels, e.g. in a cave illuminated with narrow beam illuminations, and hence, whatever the final optimal rendering on a particular device, the image encoding may desire to contain as much useful information on that as possible (while also spending as little bits as possible, e.g. on fixed memory space media such as bluray disk, or limited bandwidth network connections).
Traditionally, dynamic range of image sensors and displays has been confined to about 2-3 orders of magnitude, e.g. traditional television tried to image for a 40:1 dynamic range, which is a typical range for printing too, i.e. 8 bits were for those media considered sufficient, but they are no longer sufficient for recently emerging higher quality rendering devices, and/or smarter image processing especially related to optimal rendering on those devices. I.e., it has traditionally been possible to store and transmit images in 8-bit gamma-encoded formats without introducing perceptually noticeable artifacts on traditional rendering devices. However, in an effort to record more precise and livelier imagery, novel High Dynamic Range (HDR) image sensors that are from their claims capable of recording dynamic ranges of even up to 6 orders of magnitude have been developed. Moreover, most special effects, computer graphics enhancement and other post-production work are already routinely conducted at higher bit depths, making the visual universes created on a computer potentially infinite.
Furthermore, the contrast and peak luminance of state-of-the-art display systems continues to increase. Recently, new prototype displays have been presented with a peak luminance as high as 5000 Cd/m−2 and theoretical contrast ratios of 5 orders of magnitude. When traditionally encoded 8-bit signals are displayed on such displays, annoying quantization and clipping artifacts may appear, and furthermore, the limited information in 8 bit signals is in general insufficient to create the complex image—as to distribution of grey values—which may faithfully be rendered with these devices. In particular, traditional video formats offer insufficient headroom and accuracy to convey the rich information contained in new HDR imagery.
As a result, there is a growing need for new video formats that allow a consumer to fully benefit from the capabilities of state-of-the-art sensors and display systems. Preferably, such formats are backwards-compatible such that legacy equipment can still receive ordinary video streams, while new HDR-enabled devices take full advantage of the additional information conveyed by the new format. Thus, it is desirable that encoded video data not only represents the HDR images but also allow encoding of traditional Low Dynamic Range (LDR) images that can be displayed on conventional equipment.
The most straightforward approach would be to compress and store LDR and HDR streams independently of each-other (simulcast). However, this would result in a high data rate. In order to improve the compression efficiency, it has been proposed to employ inter-layer prediction where HDR data is predicted from an LDR stream, such that only the smaller differences between the actual HDR data and its prediction need to be encoded and stored/transmitted.
However, prediction of HDR from LDR data tends to be difficult and relatively inaccurate. Indeed, the relationship between corresponding LDR and HDR tends to be very complex and may often vary strongly between different parts of the image. For example, an LDR image may often be generated by tone mapping and color grading of an HDR image. The exact tone mapping/color grading, and thus the relationship between the HDR and LDR images will depend on the specific algorithm and parameters chosen for the color grading and is thus likely to vary depending on the source. Indeed, color grading may often be subjectively and individually modified not only for different content items but also between different images and indeed very often between different parts of an image. For example, a color grader may select different objects in an image and apply separate and individual color grading to each object. Consequently, prediction of HDR images from LDR images is typically very difficult and ideally requires adaptation to the specific approach used to generate the LDR image from the HDR image.
An example of an approach for predicting an HDR image is presented in Mantiuk, R., Efremov, A., Myszkowski, K., and Seidel, H.2006. Backward compatible high dynamic range MPEG video compression. ACM Trans. Graph. 25, 3 (July 2006), 713-723. In this approach a global reconstruction function is estimated and used to perform the inter-layer prediction. However, the approach tends to result in suboptimal results and tends to be less accurate than desired. In particular, the use of a global reconstruction function tends to allow only a rough estimation as it cannot take into account local variations in the relationship between HDR and LDR data e.g. caused by application of a different color grading
Another approach is proposed in US Patent Application US2009/0175338 wherein a mechanism for inter-layer prediction that operates on a macroblock (MB) level is presented. In the approach, the HDR stream is for each macroblock locally predicted by estimating a scale and offset parameter, which corresponds to a linear regression of the macroblock data. However, although this may allow a more local prediction, the simplicity of the linear model applied often fails to accurately describe the intricate relations between LDR and HDR data, particularly in the vicinity of high-contrast and color edges.
Hence, an improved approach for encoding HDR/LDR data and/or for generating HDR data from LDR data would be advantageous. In particular a system allowing for increased flexibility, facilitated implementation and/or operation, improved and/or automated adaptation, increased accuracy, reduced encoding data rates and/or improved performance would be advantageous.
Another important trend recently emerging is that many display devices, whether televisions, gaming monitors, or even mobile devices, are going for rendering at least some form of 3-dimensional information. It may be so that the market at the same time may not want to go for either/or of these quality modalities, i.e. either 3D LDR or 2D HDR, but that on the same low capacity systems (e.g. a bluray disk) one may want to have both quality improvements at the same time.