Until a couple of years ago, all video was encoded according to the so-called low dynamic range (LDR) philosophy, also called standard dynamic range (SDR). That meant, whatever the captured scene was, that the maximum of the code (typically 8 bit luma Y′=255; or 100% voltage for analog display driving) should by standardized definition correspond to, i.e. be rendered on, a display with a peak brightness PB (i.e. the brightest white color it can render) being by standard agreement 100 nit. If people bought an actual display which was a little darker or brighter, it was assumed that the viewer's visual system would adapt so that the image would still look appropriate and even the same as on the reference 100 nit display, rather than e.g. annoyingly too bright (in case one has e.g. a night scene in a horror movie which should have a dark look).
Of course, for practical program making this typically meant maintaining a tight control of the scene lighting setup, since even in perfectly uniform lighting the diffuse reflection percentage of various objects can already give a contrast ratio of 100:1. The black of such a SDR display may typically be 0.1 nit in good circumstances, yet 1 nit or even several nits in worst circumstances, so the SDR display dynamic range (the brightest white divided by the darkest viewable black) would be 1000:1 at best, or worse, which corresponds nicely to such uniform illuminated scenes, and an 8 bit coding for all the required to be rendered pixel grey values or brightnesses, having a gamma of approximately 2.0, or encoding inverse gamma 0.5. Rec. 709 was the typically used SDR video coding. Typically also cameras had problems capturing simultaneously both very bright and rather dark regions, i.e. a scene as seen outside a window or car window would typically be clipped to white (giving red, green and blue additive color components R=G=B=max., corresponding to their square root coded values R′=G′=B′=255). Note that if in this application a dynamic range is specified firstmost with a peak brightness (i.e. the brightest rendered or renderable luminance) only, we assume that the lowest luminance value is pragmatically zero (whereas in practice it may depend on viewing conditions such as display front plate or cinema screen light reflection, e.g. 0.1 nit), and that those further details are irrelevant for the particular explanation. Note also that there are several ways to define a dynamic range, and that the most natural one typically used in the below explanations is a display rendered luminance dynamic range, i.e. the luminance of the brightest color versus the darkest one.
Note also, something which has become clearer during the HDR research, and is mentioned here to make sure everybody understands it, that a code system itself does not natively have a dynamic range, unless one associates a reference display with it, which states that e.g. R′=G′=B′=Y′=255 should correspond with a PB of 100 nit, or alternatively 1000 nit, etc. In particular, contrary to what is usually pre-assumed, the number of bits used for the color components of pixels, like their lumas, is not a good indicator of dynamic range, since e.g. a 10 bit coding system may encode either a HDR video, or an SDR video, determined on the type of encoding, and in particular the electro-optical transfer function EOTF of the reference display associated with the coding, i.e. defining the relationship between the luma codes [0, 1023] and the corresponding luminances of the pixels, as they need to be rendered on a display.
In this text it is assumed that when a HDR image or video is mentioned, it has a corresponding peak brightness or maximum luminance for the highest luma code (or equivalently highest R′, G′, B′ values in case of an RGB coding e.g. rather than an YCbCr encoding) which is higher than the SDR value of 100 nit, typically at least 4× higher, i.e. the to be rendered maximum display luminance for having the HDR image look optimal may be e.g. 1000 nit, 5000 nit, or 10000 nit (note that this should not be confused with the prima facie complex concept which will be detailed below that one can encode such a HDR image or video as a SDR image or video, in which case the image is both renderable on a 100 nit display, but importantly, also contains all information—when having corresponding associated metadata encoding a color transformation for recovering the HDR image—for creating a HDR image with a PB of e.g. 1000 nit!).
So a high dynamic range coding of a high dynamic range image is capable of encoding images with to be rendered luminances of e.g. up to 1000 nit, to be able to display-render good quality HDR, with e.g. bright explosions compared to the surrounding rendered scene, or sparkling shiny metal surfaces, etc.
In practice, there are scenes in the world which can have very high dynamic range (e.g. an indoors capturing with objects as dark as 1 nit, whilst simultaneously seeing through the window outside sunlit objects with luminances above 10,000 nit, giving a 10000:1 dynamic range, which is 10× larger than a 1000:1 DR, and even 100 times larger than a 100:1 dynamic range, and e.g. TV viewing may have a DR of less than 30:1 in some typical situations, e.g. daylight viewing). Since displays are becoming better (a couple of times brighter PB than 100 nit, with 1000 nit currently appearing, and several thousands of nits PB being envisaged), a goal is to be able to render these images beautifully, and although not exactly identical to the original because of such factor like different viewing conditions, at least very natural, or at least pleasing. And this needs what was missing in the SDR video coding era: a good pragmatic HDR video coding technology.
The reader should also understand that because a viewer is typically watching the content in a different situation (e.g. sitting in a weakly lit living room at night, or in a dark home or cinema theatre, instead of actually standing in the captured bright African landscape), there is no identity between the luminances in the scene and those finally rendered on the TV (or other display). This can be handled inter alia by having a human color grader manually decide about the optimal colors on the available coding DR, i.e. of the associated reference display, e.g. by prescribing that the sun in the scene should be rendered in the image at 5000 nit (rather than its actual value of 1 billion nit). Alternatively, automatic algorithms may do such a conversion from e.g. a raw camera capturing to what in the text will be (generically) called a (master) HDR grading. This means one can then render this master grading on a 5000 nit PB HDR display, at those locations where it is available.
At the same time however, there will for the coming years be a large installed base of people having a legacy SDR display of 100 nit PB, or some display which cannot make 5000 nit white, e.g. because it is portable, and those people need to be able to see the HDR movie too. So there needs to be some mechanism to convert from a 5000 nit HDR to a 100 nit SDR look image of the same scene.
FIG. 1 shows a couple of illustrative examples of the many possible HDR scenes a HDR system of the future (e.g. connected to a 1000 nit PB display) may need to be able to correctly handle, i.e. by rendering the appropriate luminances for all objects/pixels in the image. E.g. ImSCN1 is a sunny outdoors image from a western movie, whereas ImSCN2 is a nighttime image. What makes HDR image rendering different from how it always was in the LDR era which ended only a couple of years ago, is that the LDR had such a limited dynamic range (about PB=100 nit, and black level+−0.1 to 1 nit), that mostly only the reflectivities of the objects could be shown (which would fall between 90% for good white and 1% for good black). So one had to show the objects independent of their illumination, and couldn't at the same time faithfully show all the sometimes highly contrasty illuminations of the scene that could happen. In practice that meant that the highly bright sunny scene had to be rendered with approximately the same display luminances (0-100 nit) as a dull rainy day scene. And even the night time scenes could not be rendered too dark, or the viewer would not be able to well-discriminate the darkest parts of the image, so again those night time brightnesses would be rendered spanning the range between 0 and 100 nit. So one had to conventionally color the night scenes blue, so that the viewer would understand he was not looking at a daytime scene. Now of course in real life human vision would also adapt to the available amount of light, but not that much (most people in real life recognize that it's getting dark). So one would like to render the images with all the spectacular local lighting effects that one can artistically design in it, to get much more realistic rendered images at least if one has a HDR display available.
So on the left axis of FIG. 1 are object luminances as one would like to see them in a 5000 nit PB master HDR grading for a 5000 nit PB display. If one wants to convey not just an illusion, but a real sense of the cowboy being in a bright sunlit environment, one must specify and render those pixel luminances sufficiently bright (though also not too bright), around e.g. 500 nit. For the night scene one wants mostly dark luminances, but the main character on the motorcycle should be well-recognizable i.e. not too dark (e.g. around 5 nit), and at the same time there can be pixels of quite high luminance, e.g. of the street lights, e.g. around 3000 nit on a 5000 nit display, or around the peak brightness on any HDR display (e.g. 1000 nit). The third example ImSCN3 shows what is now also possible on HDR displays: one can simultaneously render both very bright and very dark pixels. We see a dark cave, with a small opening through which we see the sunny outside. For this scene one may want to make the sunlit objects like the tree somewhat less bright than in a scene which wants to render the impression of a bright sunny landscape, e.g. around 400 nit, which should be more coordinated with the essentially dark character of the inside of the cave. A color grader may want to optimally coordinate the luminances of all objects, so that nothing looks inappropriately dark or bright and the contrast are good, e.g. the person standing in the dark in this cave may be coded in the master HDR graded image around 0.05 nit (assuming HDR renderings will not only be able to render bright highlights, but also dark regions).
It can be understood that it may not always be a trivial task to map all the object luminances for all these very different types of HDR scene to optimal luminances available in the much smaller SDR or LDR dynamic range (DR_1) shown on the right of FIG. 1, which is why preferably a human color grader may be involved for determining the color transformation (which comprises at least a luminance transformation, or luma transformation when equivalently performed on the luma codes). However, one can always choose to use automatically determined transformations, e.g. based on analyzing the color properties of the image content such as its luminance histogram, and this may e.g. be a preferred option for simpler kinds of HDR video, or applications where human grading is less preferred e.g. as in real-time content production (in this patent it is assumed that without limitation grading could also involve the quick setting of a few color transformation function parameters, e.g. for the whole production quickly prior to the start of capturing).
Applicant has designed a coding system, which not only can handle the communication (encoding) of merely a single standardized HDR video, for a typical single kind of display in the field (with every end viewer having e.g. a 1000 nit PB display), but which can at the same time communicate and handle the videos which have an optimal look for various possible other display types with various other peak brightnesses in the field, in particular the SDR image for a 100 nit PB SDR display.
Encoding only a set of HDR images, i.e. with the correct look i.e. image object luminances for a rendering on say a 1000 nit HDR monitor, in e.g. a 10 bit legacy MPEG or similar video coding technology is not that difficult. One only needs to establish an optimal OETF (opto-electronic transfer function) for the new type of image with considerably larger dynamic range, namely one which doesn't show banding in the many compared to white relatively dark regions, and then calculate the luma codes for all pixel/object luminances.
Applicant however designed a system which communicates HDR images as LDR images, i.e. actually LDR (or SDR, i.e. referred to a 100 nit PB reference display, and often optimally color graded on such a reference display) images are communicated, which then can already immediately be used for rendering the correctly looking SDR look on legacy 100 nit PB SDR displays. Thereto, a set of appropriate reversible color transformation functions F_ct is defined, as is illustrated with FIG. 2. These functions may be defined by a human color grader, to get a reasonably looking SDR image (Im_LDR) corresponding to the HDR master image MAST_HDR, whilst at the same time ensuring that by using the inverse functions IF_ct the original master HDR (MAST_HDR) image can be reconstructed with sufficient accuracy as a reconstructed HDR image (Im_RHDR), or, automatic analysis algorithms may be used at the content creation side for determining suitable such color transformation functions F_ct. Note that instead of relying on a receiving side to invert the functions F_ct into IF_ct, one can also send already the needed functions for calculating Im_RHDR from the received and decoded SDR image Im_RLDR. So what the color transformation functions actually do is change the luminances of the pixel in a HDR image (MAST_HDR) into LDR luminances, i.e. the optimal luminance compression as shown in FIG. 1 to fit all luminances in the 100 nit PB LDR dynamic range DR_1. Applicant has invented a method which can keep the chromaticities of the colors constant, effectively changing only their luminances, as will be elucidated below.
A typical coding chain as shown in FIG. 2 works as follows. Some image source 201, which may e.g. be a grading computer giving an optimally graded image, or a camera giving a HDR output image, delivers a master HDR image MAST_HDR, to be color transformed and encoded. A color transformer 202 applies a determined color transformation, e.g. a concave bending function, which for simplicity of elucidation we will assume to be a gamma function with coefficient gam=1/k and k a number larger than 2.0. Of course more complex luminance mapping functions may be employed, provided that they are sufficiently reversible, i.e. the Im_RHDR image has negligible or acceptable banding. By applying these color transformation functions F_ct comprising at least luminance transformation functions, an output image Im_LDR results. This image or set of images is encoded with a legacy LDR image encoder, which may potentially be modified somewhat, e.g. the quantization tables for the DCT-ed transformations of the prediction differences may have been optimized to be better suited for images with HDR characteristics (although the color transformations may typically already make the statistics of the Im_LDR look much more like a typical LDR image than a typical HDR image, which HDR image typically has relatively many pixels with relatively dark luminances, as the upper part of the range may often contain small lamps etc.). E.g., a MPEG-type encoder may be used like HEVC (H265), yielding an encoded SDR image Im_COD. This video encoder 203 then pretends it gets a normal SDR image, although it also gets the functions F_ct which allow the reconstruction of the master HDR image, i.e. effectively making this a dual co-encoding of both an SDR and a HDR look, and their corresponding set of images (Im_RLDR, respectively Im_RHDR). There may be several manners to communicate this metadata comprising all the information of the functions F_ct, e.g. they may be communicated as SEI messages. Then a transmission formatter 204 applies all the necessary transformations to format the data to go over some transmission medium 205 according to some standard, e.g. a satellite or cable or internet transmission, e.g. according to ATSC 3.0, i.e. packetization of the data is performed, channel encoding, etc. At any consumer or professional side, a receiver 206, which may be incorporated in various physical apparatuses like e.g. a settopbox, television or computer, undoes the channel encoding by applying unformatting and channel decoding. Then a video decoder 207 applies e.g. HEVC decoding, to yield a decoded LDR image Im_RLDR. Then a color transformer 208 is arranged to transform the SDR image to an image of any non-LDR dynamic range. E.g. the 5000 nit original master image Im_RHDR may be reconstructed by applying the inverse color transformations IF_ct of the color transformations F_ct used at the encoding side to make the Im_LDR from the MAST_HDR. A display tuning unit 209 may be comprised which transforms the SDR image Im_RLDR to a different dynamic range, e.g. Im3000 nit being optimally graded in case display 210 is a 3000 nit PB display, or a 1500 nit or 1000 nit PB image, etc.
FIG. 3 shows how one can design such a chromaticity-preserving luminance re-calculation, taken from WO2014056679. One can understand this processing when seen in the gamut normalized to 1.0 maximum relative luminance for both the SDR and the HDR image (i.e. assuming that the SDR and HDR have the same e.g. Rec. 2020 primaries, they have then exactly the same tent-shaped gamut; as shown in FIG. 1 of WO2014056679). If one were to drive any display with e.g. the cowboy having in the driving image a luma code corresponding to a luminance of 10% of peak brightness of the display, then that cowboy would render brighter the higher the PB of the display is. That may be undesirable, as we may want to render the cowboy with (approximately) the same luminance on all displays, e.g. 60 nit. Then of course his relative luminance (or the corresponding 10 bit luma code) should be lower the higher the PB of the display is, to get the same ultimate rendered luminance. I.e., one could represent such a desire as a downgrading mapping e.g. from luma code 800 for the SDR image, to e.g. luma code 100 for the HDR image (depending on the exact shape of the EOTF defining the codes which is used), or, in luminances one maps the 60% SDR luminance to e.g. 1/40th of that for a 4000 nit HDR display or its corresponding optimally graded image. Downgrading in this text means changing the luma codes of the pixels (or their corresponding to be rendered luminances) from a representation of higher peak brightness (i.e. for rendering on a higher PB display, e.g. of 1000 nit PB) to the lumas of an image of the same scene in a lower PB image for rendering on a lower PB display, e.g. a 100 nit SDR display, and upgrading is the opposite color transformation for converting a lower PB image into a higher PB image, and one should not confuse this with the spatial upscaling and downscaling, which is adding new pixels respectively dropping some pixels or some color components of those pixels. One can do that for any color, in which a (RGB) triplet corresponds to some chromaticity (x,y) in the display or encoding code gamut, in a manner which will automatically scale to the maximum luminance available (renderable) for that chromaticity Lmax(x,y), by the apparatus of FIG. 3. Actually, one can demonstrate that this corresponds to applying a similar luminance mapping, which on the achromatic axis (i.e. of colors having no particular hue) which takes the input luminance L of the color in the SDR image, to the needed relative output luminance L* of the optimal HDR graded image. Without diving into details, what is relevant from this teaching, is that the corresponding color transformation can then be realized as a multiplicative transformation on the (preferably linear) RGB components, on each component separately, by a multiplier 311, with a constant g larger or smaller than 1.0, which corresponds to whatever shape of the luminance transformation function L_out=TM(L_in) one choses, which can also be formulated as a functional transformation of the maximum of the input red, green and blue color values of a pixel. So for each input color (R,G,B), the appropriate g-value is calculated for applying the desired color transformation which transforms Im_RLDR into Im_RHDR (or in an appropriately scaled manner into any other graded image, like Im3000nit), when luminance mapper 307 gets some SDR-luminance to HDR_luminance mapping function, e.g. a parametrically specified loggamma function or sigmoid, or a multilinear curve received as a LUT. The components of the exemplary embodiment circuit are: 305: maximum calculator, outputting the maximum one (maxRGB) of the R, G, and B values of a pixel color being processed; 301: luminance convertor, calculating the luminance of a color according to some color definition standard with which the system currently works, e.g. Rec. 2020; 302: divider, yielding Lmax(x,y) as L/max(R,G,B); 307 luminance mapper actually working as a mapper on maxRGB, yielding m*=TM(maxRGB), with TM some function which defines the luminance transformation part of F_ct; 308: a multiplier, yielding L*=(m*)×Lmax(x,y) and 310 a gain determination unit, being in this embodiment actually a divider, calculating g=L*/L, i.e. the output HDR relative luminance divided by the input SDR relative luminance L; and 311 is a multiplier arranged to multiply the three color components R,G,B with the same g factor.
This circuit may be appropriate for some color encodings. However, one would ideally like to work in typical SDR encodings as they are typically used. Im_LDR as it would come out of HEVC decoder 207, would typically be in a non-linear Y′CbCr encoding (wherein we can assume the non-linearity to be a square root approximately, i.e. ignoring the non-constant luminance issues then: Y′=sqrt(L) approximately). More of an issue for such novel SDR-communicated HDR video codings (or as explained above actually a set of gradings containing at least a SDR and a master HDR look) was the fact that in some situations the Cb and Cr coefficients may be sub-sampled spatially, e.g. in a 4:2:0 encoding. That was not really an issue in SDR, not just because any color errors wouldn't show up large on a 100 nit rendering, but also, because in the above explained technology, we have a color transformation for reconstructing the Im_RHDR (or the display tuned Im3000nit), which just doesn't exist in the SDR video coding paradigm. If one has the wrong RGB values (because one no longer has the full resolution correct RGB values per pixels at the decoding side), this leads to reading at the wrong place of the TM function, or in other words, getting the wrong g-factor in the decoder of FIG. 3. I.e., one might risk sometimes seriously incorrectly boosting or dimming the SDR color, e.g. at a boundary of an object, giving potentially visible artefacts, and no sufficiently accurate Im_RHDR reconstruction.