After many years of using classical image/video coding technology (starting with NTSC and continuing with MPEG 2 up to MPEG-HEVC), what we now call low dynamic range (LDR) coding, recently research and development has begun to determine the next generation of video codec, which is capable of handling so called High Dynamic Range (HDR) images of HDR scenes.
This would on the one hand require a camera which can capture the increased dynamic range, at least above 11 stops (e.g. the current cameras of ARRI getting about 14 stops), or preferably even above 16 stops. Some cameras use e.g. a slow and fast exposure and mix those, or other cameras can use beam splitting towards two or more sensors of different sensitivity.
Whereas in classical imaging a lot of information was thrown away (hard clipped), e.g. outside a room or car, present imaging systems can capture all that information, and the question is what to do with it then, in particular when rendering it on a display. 16 stops should already be sufficient to capture many (though not all) HDR scenes, but on a display one needn't necessarily render e.g. a welding arc as bright compared to an average brightness as in the real scene, nor could one do so on typical displays. Higher dynamic range displays are emerging currently, which have a higher peak brightness than the current typical 500 nit (or for grading reference monitors of 100 nit) peak brightness (PB) of LDR displays, like e.g. 800-1000 nit televisions are emerging, and SIM2 has made a 5000 nit monitor.
But LDR codec specifications cannot sufficiently encode the detail in HDR images to a receiver, especially when also needing to take into account the current typical limitations like inter alia word-length in number of bits of the code words representing e.g. the luminances (as codes called lumas), which have to be handled by various ICs (e.g. 10 bit per color component may be desirable, at least in some video communication applications). Especially if one wants a working system on short term, it should not deviate too much from existing technology in the field, yet still allow the encoding, handling, and final displaying of image, with much more beautiful HDR looks than an LDR image (e.g. brighter lamps or realistic fire, more contrasty scales of a lizard in the sun, etc.).
A HDR image is an image which encodes the textures of a HDR scene (which may typically contain simultaneously both very bright and dark regions, and maybe even intermediate brightness regions, with also a significant number of grey values which need to be accurately rendered ideally), with sufficient information for high quality encoding of the color textures of the various captured objects in the scene, so that a visually good quality rendering of the HDR scene can be done on a high quality HDR display with high peak brightness, like e.g. 5000 nit. FIG. 1 shows a typical HDR image, namely a toy store at night, with brightly colored toys or boxes strongly illuminated compared to the average illumination, because some of those toys are close to the local lamps, yet other toys are far away in shadow regions. In contrast with day scenes in which sun and sky illuminate each point the same similarly, at night there may be only a few light sources, which light the scene in a quadratically diminishing manner. This creates bright regions 104 around the light source itself, and dark regions in faraway corners. E.g. sewer inlet 114 gets almost no light from anywhere, so it is very dark in the sewer. I.e. in a night scene we may at the same time have image region luminances (or when captured by a linear camera: pixel luminances in those regions) of above 10,000 nit for the lamps themselves, and fractions of a nit, e.g. 0.001 nit for the dark regions, making the total dynamic range 10 million to 1. This being the theoretical range for the brightest versus darkest pixel, the useful dynamic range may of course be lower, since one may not need to accurately represent for the viewer a couple of small lamps or a small dark patch behind the sewer inlet, but in typical HDR scenes even the useful dynamic range of the normal objects of interest may be well above 10,000:1 (or 14 stops). Mapping these luminances blindly without smart re-determination of the to be rendered object pixel luminances to a display of 2000 nit peak brightness, means that it should “theoretically” (assuming that a relative-to-peak-white rendering is sufficient for good visual quality rendering of this exemplary scene) have a minimum (visible) black of at least 0.2 nit.
HDR video (or even still image) encoding has only recently been researched and has been a daunting task up to now, and the typical belief of the research community is that one either needs to go towards significantly more bits, for encoding the brightnesses above the LDR range of scene objects (e.g. encodings which encode scene luminances directly), or, one needs some two-layer approach, wherein e.g. in addition to an object reflectance image there is a illumination boost image, or similar decomposition strategies. An example of such a two-image-per-time-instant HDR video encoding system can be found in U.S. Pat. No. 8,248,486B1 or WO2005/1040035.
Applicant has recently proposed a much simpler single-image-per-time-instant approach (see WO2011/107905 and WO2012/153224), which is a parametric, functional manner of encoding both a HDR and LDR look image, because in addition to simply encoding a single HDR image (also called look or grading), typically suitable for displays with peak brightnesses (or in fact dynamic ranges) around a pre-chosen reference value, e.g. 1500 nit, we also want to cater in our framework for the other displays with other dynamic ranges in the market. I.e., since there will also be e.g. portable displays of 500 or 100 nit, rather than to leave it blindly to the receiving side how to change the encoded high dynamic range image to some reasonably looking LDR image by auto-conversion, we co-encode in color processing functions (and the parameters characterizing their functional shapes) how to arrive at an appropriate LDR image starting from the encoded HDR image, namely an LDR image that a content creator could agree with.
With “high dynamic range” (HDR) we typically mean that either the image(s) as captured from the capturing side have 1) a high luminance contrast ratio compared to legacy LDR encoding (i.e. contrast ratios of 10.000:1 or more may be achievable by the coding, and all components of the image handling chain up to rendering); and 2) captured object luminances above at least 1000 nit should be encodable, or more specifically, may need to be reproducible above 1000 nit to, given the reproduction environment, to generate some desired appearance of say a lit lamp or sunny exterior. Or, the rendering of such image(s) is HDR (i.e. the images must be suitable in that they contain information which is sufficient for high quality HDR rendering, and preferably in a technically easy to use manner), meaning the image(s) are rendered or intended to be rendered on displays with peak brightness of at least 1000 nit (not implying they can't be rendered on LDR displays of e.g. 100 nit peak brightness, typically after suitable color mapping re-determining the luminances of the various image objects, so that the resultant object luminances are more suitable to the different display dynamic range and possibly viewing environment).
When designing a new HDR coding system, one has to research and come to a solution on a number of things consecutively, even before being able to fill in details of any practical coding system, for which there was no good uniform view. Firstly: what code allocation function which maps scene object luminances to e.g. 10 bit (or even 8 for lower quality systems, or e.g. 12 for professional quality) lumas actually encoding those to be rendered luminances for the pixels should one use? We will call the codes which encode the perceivable brightnesses or rendered luminances of pixels lumas, because this was the name given also in LDR encoding, but now the code allocation function may be one of possible alternative, but at least very different from the gamma 2.2 code allocation function of LDR video encoding. The skilled person will understand that when we elucidate a technology with a behavior of luminances or equivalently lumas, in actual embodiments the processing may be done on lumas themselves, like when using a Y′u′v′ color representation in which Y′ is a luma determined with a pre-fixed code allocation function and u′ and v′ are chromaticity coordinates, or equivalently on linear or non-linear RGB representations. The choosing of a code allocation function can be formulated equivalently as defining a master EOTF (electro-optical transfer function), which defines how a theoretical, reference display model which converts the luma codes or lumas of the HDR image into rendered luminances on the reference display. The LDR variant was fixed rather accidentally to a 2.2 power law or so-called gamma function, from the physical behavior of CRT electron guns, and it happened to work nicely psychovisually on those kinds of displays with peak brightnesses of around 100 nit, and with images captured according to a corresponding LDR capturing philosophy, with inter alia reasonably uniform illumination of the scene, correct exposure, and clipping of less interesting image regions.
But secondly, even before one can define a code allocation function distributing luminances to codes along a code range (e.g. 0-1013) one must define what one may call a master luminance range, which is a best range for encoding typical HDR. This step should not be overlooked. In LDR one just happened to have a range, by exposing relative to middle grey and white, and whatever dynamic range a sensor had (and ignoring that maybe e.g. a soft slope of celluloid film may yield a rather contrastless image, whilst a digital camera image may have clipping on the white end and/or drowning in the noise at the black end of the encoding). Early researchers working on still images thought it would make sense to just make that the linear range of typical luminances in the scene (i.e. from very small fractions of a nit, up to billions of nits), but for video coding given all practical aspects to take into account, it doesn't pragmatically make much sense to make this master luminance range go up to the 1 billion nits of the sun.
However, even when understanding that one needs to define a new master luminance range for handling all typical HDR images (typical after suitable artistic grading to object luminances which would be suitable for display, even high quality HDR display), the preconception was that one only needs to define one single large enough HDR master luminance range, which would then suffice for all scenarios. Those applications which desire an HDR look of a scene, would then work on a received image which was encoded along this master luminance range, e.g. with luminances up to a maximum luminance being 10000 or 5000 nit.
WO014/009844 describes an example of such a master luminance range-based HDR video encoding system, being similar to the below embodiments in that it also follows the single-image-per-time-instant encoding philosophy of applicant, whereby a sole image is encoded for each time instant of the video, which in this teaching will be a first, LDR (i.e. 100 nit) look, and in addition to that color processing functions are encoded in metadata associated with the sole images, to convert them into a second look, being a HDR look (which could be a 5000 nit master grading reconstruction). However, the teachings in this patent application follow the rationale of the single fixed master luminance range technical design philosophy. Typically only a single LDR and HDR look are encoded (from this information there might be other intermediate looks calculated at a receiving side, e.g. the LDR image may be upgraded to a look required for a 1200 nit connected display, but there is no other, intermediate, lower HDR-quality image encoding itself taught, i.e. only 100 nit LDR images are transmitted). And this HDR look is the master look created on the 5000 nit master luminance range, and the LDR image is an image for a 100 nit reference display, as happened in the LDR era, and the HDR image is actually encoded via functional transformation from the sole communicated LDR image. I.e. nothing but the master HDR e.g. 5000 nit image (HDR_ORIG, HDR_FIN) is taught in addition to the LDR look, which is typically required for backwards compatibility with legacy displays, etc.
US2014/097113 of applicant also teaches how one can communicate an HDR image, which may be the sole HDR image received, and wherefrom other gradings could be calculated, but this document is silent on that aspect. What this prior art teaches, is that one could encode several dynamic range looks alternatively in the same existing LDR encoding container technology. One has to indicate which version was used then, so that the receiver cannot be confused. E.g., the image pixels could have colors defined with 3 16 bit R, G and B color components according to a standard LDR encoding definition (i.e. with a Rec. 709 code allocation function). In that case the receiver will know this is a grading for a 100 nit display, and will hence display it with a maximum luminance exactly or approximately equal to 100 nit, even when the display has a peak brightness of 2500 nit, and could hence render images so bright. Alternatively, the same R, G and B color components could contain colors of a e.g. 5000 nit HDR image, which means that the relative values of object pixel colors will be different (e.g. a dark object may have a red component of 0.05 in LDR, but 0.0005 in HDR). In case the received LDR-container encoded images actually contain an HDR image, that fact will be indicated to the receiver, by metadata stating what the encoding actually is. Hence the receiver can know how to optimally render on a particular display, by its own optimization processing. E.g. if a 5000 nit image is received, and a 4500 nit display is connected, that image might be directly rendered without prior colorimetric transformation. If however a 100 nit display is connected, such a received 5000 nit image will first have to be down-graded, but that is not needed if already an appropriate 100 nit image was received. So what is taught in the prior art, is that a receiver may need to do on its side some color transformation to make e.g. a 1000 nit received HDR image more optimal for e.g. a 500 nit display, but in this teaching nothing is taught on how that should be done, let alone if and how that should be facilitated by communicating more information from a transmitting side. I.e., apart from teaching how to encode and specify various possible HDR or LDR gradings, this document however teaches nothing about system configurations actually having at least two HDR encodings at the transmitter side, nor the recoverability thereof at a receiving side (in this prior art there will be only one HDR image, which one could equate with our master 5000 nit grading).
It may when formulating a universal HDR encoding system seem rather illogical prima facie to deviate from the ultimate HDR image encoding on the single master luminance range (inter alia why would one need anything else than this best possible grading of the HDR scene, going up to what would be the highest reasonably renderable object luminances, or why would one make things complicated by having more possible ways to define HDR?), however the inventor felt there would for a class of applications still be a need for even more versatility regarding the definition of the luma codes and what HDR luminances, or more precisely to be rendered luminances on one or more displays, those lumas would correspond to.