Recently new developments have occurred regarding the encoding of images/video (whether of captured scenes or computer graphics), namely, it is desirable to better capture the entire range of luminances and colors occurring in nature, which is called HDR (high dynamic range) encoding. As both cameras and displays are getting increasingly larger native ranges, a better standard is required for transmitting the image information between them. On the other hand, still a large number of lower-range devices exist (e.g. old displays, printers, etc.), and these are also present in some imaging system chains. Typically a low dynamic range (LDR) device like a low quality camera encodes in 8 bit data words (pixels) a middle range of interesting values (e.g. well-lit face colors), at the cost of colors outside this range [note that where understanding is not sacrificed, we may use the term color even if in a color coding triplet its luminance is the most important factor for the present discussion].
If a human looks at an image, there are a number of factors influencing the quality. Firstly there is the brightness of the whitest white which can be reproduced. Secondly, there is the darkest black which still can be reproduced, and perhaps reproduced reasonably, e.g. with little noise or other interference. White and black determine the dynamic range of the device. But for a real image, those are not the only parameters influencing the look. There are also parameters determining where the intermediate greys should ideally be. A first one is contrast, which is a measure related to the lightness of different objects in the image. If there are at least some objects of the different possible greys between good white and black, the image is said to globally have good contrast. But also local contrast can be important, e.g. between one object and its surroundings. Even very local luminance changes like sharpness influences perceived contrast. It is by looking at e.g. a real scene that viewers see it has really impressive contrast (e.g. as contrasted to an adjacent 6 bit projected image). But secondly, also the location of objects/regions on the black-to-white axis will have impact, particularly on naturalness (or artistic look). E.g. (well lit) faces are supposed to have a certain percentage of light reflection compared to white. A face which is too white may seem strangely glowing, or the viewer may misinterpret the image in that he thinks the face is illuminated by some additional light. Thirdly, the precision of the allocated colors may be important, not so much in complex textures, but e.g. in facial gradients. Many viewers seem to prefer the brightness-related quality improvements (inclusive the related color saturation) over the other aspects, and this application will mostly focus on luminance-related issues.
The purpose of a display is to display a quality rendering to a viewer. Ideally, this would be an accurate (photorealistic) representation, but since this is still far in the future, other quality criteria can be used like e.g. recognizability of the image, approximate naturalness (e.g. absence of artefacts), or visual effect/impact, etc.
A popular HDR display emerging currently is an LCD with LED backlights in a 2-dimensional pattern, allowing 2-dimensional dimming. The dynamic range of such displays is influenced by several factors.
Firstly, LCDs are getting increasingly brighter due to improved backlighting. Where a couple of years ago 200 nit white was typical, now 500 nit is typical, the coming years 1000 nit will be typical, and a later even 2000 nits or above. However, this poses severe technical constraints on the television or monitor, such as cost and power usage.
Secondly, regarding the blacks, LCDs have a problem with light leakage (especially under certain conditions like large angle viewing), which means that an LCD may have an intrinsic contrast (LCD cell open/closed) of 100:1, although research is making LCDs better. A solution to this is to change the amount of light from behind coming through the LCD valve. 2D dimming displays can in this way theoretically achieve very high contrast, since if the light behind the LCD cell has zero luminance, apart from leakage a zero luminance will locally come out of that region of the display. Dynamic ranges above 10000:1 or even 100000:1 have been reported. However, in practice, a major factor limiting the display black rendering is the light from the surroundings reflected on the front glass of the display. This may reduce the dynamic range to a more realistic 100:1 or even less than 20:1 for bright surrounds. However, also in a dark viewing environment light may leak due to all kinds of reasons, e.g. interreflections on the front glass from a brighter region to a darker region.
Lastly, of course the human eye is also of importance, and mainly its adaptation state, but also the complex image analysis happening in the brain. The eye adapts on a combination of room illumination on the one hand and display brightness on the other (actually, the images shown). These two factors may be relatively in tune for e.g. 500 nit televisions under normal living room viewing, but may also be far apart in other rendering scenarios. Not only the detail seen in black will be influenced, but also the appearance of the bright regions. E.g., viewing comfort will be influenced by the particular display settings, i.e. tiring of the eyes, or even psychological effects like not liking the image rendering. The retina is very complex, but can simply be summarized as follows. Its cones have a biochemical process which always tries to make the sensitivity of the eye (by means of amounts of light sensitive molecules) optimal for any given scene. This works because whatever the illumination (which may change between full moonlight 0.1 lx, to overcast sky or not too well lit rooms 100 lx, to direct bright sunlight 100000 lx, i.e. range over more than a million difference factor), object reflections typically range over 1-100%, and it is that dark panther in the dark bush that human vision optimally needs to discern locally. The eye needs to cope with a larger scene dynamic range—taking illumination effects like shadows or artificial illumination into account—which can typically be 10000:1. Further retinal cells like the ganglion cells make smarter use of the combination of all these primary signal, and so doing e.g. change the level of a local response dependent on the luminances of its surroundings etc.
Lastly, a very important factor in converting by analysis of this preprocessed raw image field is the visual cortex. It will e.g. redetermine the color of a yellow patch once it realizes that this patch is not a separate object but rather part of another yellow object, or recolor the grass seen behind a glass window once it understands the colored reflection overlapping that local region. It generates what we may call the final color “appearance” and it is theoretically this factor which both display manufacturers and content creators are in the end interested in. So any technology which conforms more to what human vision needs is desirable (in particular when taking into account other technical constraints).
Although there is no generally recognized standard for encoding HDR images yet (especially for video), first attempts to encode images (typically captured by stretching the limits of camera systems by e.g. using multiple exposures and hoping the lens doesn't thwart the effort too much) did this by allocating large bit words (e.g. 16 bit, allowing 65000:1 linear coding, and more for non-linear coding) to each pixel (e.g. the exr format). Then, the mapping of a variable amount of light reflecting (to which the eye partially but largely adapts) on scene objects to an image rendering system comprising an LCD valve module and a backlight can be done by e.g. illumination estimation techniques like in EP1891621B [Hekstra, stacked display device]. A simplistic algorithm to realize the output_luminance=backlighting_luminance×LCD_transmission is to take the square root of the HDR 16 bit input, so allocating a multiplicative 8 bit background image which may be subsampled for the LEDs (conforming to ratio coding techniques). There are also other methods to plainly encode the appearing scene luminance values merely as they are in classical ways, e.g. EP2009921 [Liu Shan, Mitsubishi Electric] which uses a two layer approach for encoding the pixel values.
However, the inventors have realized that, if one goes for a new encoding, in addition to such mere encoding of the scene image pixels (and using this as main, sole encoding for the entire chain), some further encoding is desirable, as it will greatly improve the understanding and hence usability of the imaged actions.