Until a couple of years ago, all video was encoded according to the so-called low dynamic range (LDR) philosophy, also called standard dynamic range (SDR). That meant, whatever the captured scene was, that the maximum of the code (typically 8 bit luma Y′=255; or 100% voltage for analog display driving) should by standardized definition correspond to, i.e. be rendered on, a display with a peak brightness PB (i.e. the brightest white color it can render) being by standard agreement 100 nit. If people bought an actual display which was a little darker or brighter, it was assumed that the viewer's visual system would adapt so that the image would still look appropriate and even the same as on the reference 100 nit display, rather than e.g. annoyingly too bright (in case one has e.g. a night scene in a horror movie which should have a dark look).
Of course, for practical program making this typically meant maintaining a tight control of the scene lighting setup, since even in perfectly uniform lighting the diffuse reflection percentage of various objects can already give a contrast ratio of 100:1. The black of such a SDR display may typically be 0.1 nit in good circumstances, yet 1 nit or even several nits in worst circumstances, so the SDR display dynamic range (the brightest white divided by the darkest viewable black) would be 1000:1 at best, or worse, which corresponds nicely to such uniform illuminated scenes, and an 8 bit coding for all the required to be rendered pixel grey values or brightnesses, having a gamma of approximately 2.0, or encoding inverse gamma 0.5. Rec. 709 was the typically used SDR video coding. Typically also cameras had problems capturing simultaneously both very bright and rather dark regions, i.e. a scene as seen outside a window or car window would typically be clipped to white (giving red, green and blue additive color components R=G=B=max., corresponding to their square root coded values R′=G′=B′=255). Note that if in this application a dynamic range is specified firstmost with a peak brightness (i.e. the brightest rendered or renderable luminance) only, we assume that the lowest luminance value is pragmatically zero (whereas in practice it may depend on viewing conditions such as display front plate or cinema screen light reflection, e.g. 0.1 nit), and that those further details are irrelevant for the particular explanation. Note also that there are several ways to define a dynamic range, and that the most natural one typically used in the below explanations is a display rendered luminance dynamic range, i.e. the luminance of the brightest color versus the darkest one.
Note also, something which has become clearer during the HDR research, and is mentioned here to make sure everybody understands it, that a code system itself does not natively have a dynamic range, unless one associates a reference display with it, which states that e.g. R′=G′=B′=Y′=255 should correspond with a PB of 100 nit, or alternatively 1000 nit, etc. In particular, contrary to what is usually pre-assumed, the number of bits used for the color components of pixels, like their lumas, is not a good indicator of dynamic range, since e.g. a 10 bit coding system may encode either a HDR video, or an SDR video, determined by the type of encoding, and in particular the electro-optical transfer function EOTF of the reference display associated with the coding, i.e. defining the relationship between the luma codes [0, 1023] and the corresponding luminances of the pixels, as they need to be rendered on a display.
In this text it is assumed that when a HDR image or video is mentioned, it has a corresponding peak brightness or maximum luminance for the highest luma code (or equivalently highest R′, G′, B′ values in case of an RGB coding e.g. rather than an Y′CbCr encoding) which is higher than the SDR value of 100 nit, typically at least 4× higher, i.e. the to be rendered maximum display luminance for having the HDR image look optimal may be e.g. 1000 nit, 5000 nit, or 10000 nit (note that this should not be confused with the prima facie complex concept which will be detailed below that one can encode such a HDR image or video as a SDR image or video, in which case the image is both renderable on a 100 nit display, but importantly, also contains all information—when having corresponding associated metadata encoding a color transformation for recovering the HDR image—for creating a HDR image with a PB of e.g. 1000 nit!).
So a high dynamic range coding of a high dynamic range image is capable of encoding images with to be rendered luminances of e.g. up to 1000 nit, to be able to display-render good quality HDR, with e.g. bright explosions compared to the surrounding rendered scene, or sparkling shiny metal surfaces, etc. And simultaneously relatively dark pixel colors or their luminances can be encoded (even if not rendered on display). So for avoidance of doubt, when the present text talks about high dynamic range (created original) images, or codings of those images, we mean that the coding can at least handle a luminance range greater than what the standard rec. 709-based SDR coding could handle, i.e. whatever the brightest code is actually mapped to on a display as rendered luminance, the coding would be capable of encoding a luminance range of at least larger than 1000:1, and preferably much larger to enable coding of scenes with even higher illumination contrasts.
In practice, there are scenes in the world which can have very high dynamic range (e.g. an indoors capturing with objects as dark as 1 nit, whilst simultaneously seeing through the window outside sunlit objects with luminances above 10,000 nit, giving a 10000:1 dynamic range, which is 10× larger than a 1000:1 DR, and even 100 times larger than a 100:1 dynamic range, and e.g. legacy TV viewing may have a DR of less than 30:1 in some typical situations, e.g. daylight viewing). When one would like to be able to render at least in theory the most realistic images to humans, one can debate about what a human would like to see simultaneously as contrasting pixel luminances on a display, or the simpler question of what he is able to see well. On both aspects there has been debate, perhaps somewhat wanting to prove a particular point, and sometimes it is said that 10,000:1 luminance contrast ratio should be sufficient, but if a person walks in a dark street he sees both dark pixels well below one nit, and bright lights which can be several 1000s or 10,000s of nits, and this is not necessarily unwatchable. So although there may be pragmatic choices as to what luminances should or can easily be rendered, in the present text for the elucidation of the concepts, we don't want to limit ourselves too much on what an upper limit of dynamic range for any HDR scene would necessarily always need to be.
Since displays are becoming better (a couple of times brighter PB than 100 nit, with 1000 nit currently appearing, and several thousands of nits PB being envisaged), a goal is to be able to render these images beautifully, and although not exactly identical to the original because of such factor like different viewing conditions, at least very natural, or at least pleasing. And this needs what was missing in the SDR video coding era: a good pragmatic HDR video coding technology, and, the good use of such video, e.g. when rendering it optimally.
The reader should also understand that because a viewer is typically watching the content in a different situation (e.g. sitting in a weakly lit living room at night, or in a dark home or cinema theatre, instead of actually standing in the captured bright African landscape), there is no identity between the luminances in the scene and those finally rendered on the TV (or other display). This can be handled inter alia by having a human color grader manually decide about the optimal colors on the available coding DR, i.e. of the associated reference display, e.g. by prescribing that the sun in the scene should be rendered in the image at 5000 nit (rather than its actual value of 1 billion nit). Alternatively, automatic algorithms may do such a conversion from e.g. a raw camera capturing to what in the text will be (generically) called a (master) HDR grading. This means one can then render this master grading on a 5000 nit PB HDR display, at those locations where it is available. Even if we say that a pragmatic good version for the peak brightness of a coding (PB_C) may be typically e.g. 5000 nit, it doesn't mean than one cannot encode any higher dynamic range scenes, in any chosen coding specification, and then render them optimally on whatever display one has available with whatever display peak brightness (PB_D), the latter being a question of optimal display tuning of the HDR image(s). I.e. there is no particular need to encode any image in a fully display-referred manner, let alone to ultimately fix the creation of any content to any particular display (e.g. a 5000 nit, or even worse a 1000 nit PB_D display).
This being indicative of how one could encode any master HDR image (per se), at the same time however, there will for the coming years be a large installed base of people having a legacy SDR display of 100 nit PB, or some display which cannot make 5000 nit white, e.g. because it is portable, and those people need to be able to see the HDR movie too. So there needs to be some mechanism to convert from a 5000 nit HDR to a 100 nit SDR look image of the same scene. There exists however a problem, which is the mirror problem of the fact that we needed HDR video coding because one cannot just keep rendering SDR video which is intended for SDR displays of PB_D of around 100 nit on ever higher peak brightness displays. Because that image when rendered will look far too bright, the mood of e.g. a night scene in a thriller may be totally lost, as if one switches on a battery of lights like on the ceiling of a supermarket. If one was then to conclude one could just make a single kind of HDR graded images, those may look in many circumstances way too dark for direct SDR rendering. That is because, if we look at the pixel luminances in a relative [0.0-1.0] representation, the darkest pixel luminances will be a very small fraction of the brightest ones. E.g. consider a night scene with a criminal moving through the shadows (may be barely visible), when there is also some bright light somewhere in the image. E.g., in a 1000 nit graded coding for rendering on a 1000 nit display, we may consider that the criminal is well rendered with pixel luminances up to 10 nit, whilst the light should be nicely bright if rendered at 1000 nit. That means there is a contrast ratio of 100:1 between those two pixel regions in the image. If we now use the classical paradigm of the relative rendering of the SDR-era, namely map the brightest white (or PB_C) of the coding to the display brightest white (PB_D), then for a 100 nit SDR display this means that the relevant action of the criminal will fall below 1 nit. This could be below the front glass reflections on the display, so instead of nicely watching the movie, the SDR viewer may be straining his eyes to try to see what's happening. One can imagine that “some brightening” of that darkest pixels would be advantageous to get at least some better SDR image, but one can also imagine that preferably this is not done arbitrary, but rather content dependent. So one needs a second image grading always, and a way to communicate it somehow to any receiver.
FIG. 1 shows a couple of illustrative examples of the many possible HDR scenes a HDR system of the future (e.g. connected to a 1000 nit PB display) may need to be able to correctly handle, i.e. by rendering the appropriate luminances for all objects/pixels in the image. E.g. ImSCN1 is a sunny outdoors image from a western movie, whereas ImSCN2 is a nighttime image. What makes HDR image rendering different from how it always was in the LDR era which ended only a couple of years ago, is that the LDR had such a limited dynamic range (about PB=100 nit, and black level+−0.1 to 1 nit), that mostly only the reflectivities of the objects could be shown (which would fall between 90% for good white and 1% for good black). So one had to show the objects independent of their illumination, and couldn't at the same time faithfully show all the sometimes highly contrasty illuminations of the scene that could happen. In practice that meant that the highly bright sunny scene had to be rendered with approximately the same display luminances (0-100 nit) as a dull rainy day scene. And even the night time scenes could not be rendered too dark, or the viewer would not be able to well-discriminate the darkest parts of the image, so again those night time brightnesses would be rendered spanning the range between 0 and 100 nit. So one had to conventionally color the night scenes blue, so that the viewer would understand he was not looking at a daytime scene. Now of course in real life human vision would also adapt to the available amount of light, but not that much (most people in real life recognize that it's getting dark). So one would like to render the images with all the spectacular local lighting effects that one can artistically design in it, to get much more realistic rendered images at least if one has a HDR display available.
So on the left axis of FIG. 1 are object luminances as one would like to see them in a 5000 nit PB master HDR grading for a 5000 nit PB display. If one wants to convey not just an illusion, but a real sense of the cowboy being in a bright sunlit environment, one must specify and render those pixel luminances sufficiently bright (though also not too bright), around e.g. 500 nit. For the night scene one wants mostly dark luminances, but the main character on the motorcycle should be well-recognizable i.e. not too dark (e.g. around 5 nit), and at the same time there can be pixels of quite high luminance, e.g. of the street lights, e.g. around 3000 nit on a 5000 nit display, or around the peak brightness on any HDR display (e.g. 1000 nit). The third example ImSCN3 shows what is now also possible on HDR displays: one can simultaneously render both very bright and very dark pixels. We see a dark cave, with a small opening through which we see the sunny outside. For this scene one may want to make the sunlit objects like the tree somewhat less bright than in a scene which wants to render the impression of a bright sunny landscape, e.g. around 400 nit, which should be more coordinated with the essentially dark character of the inside of the cave. A color grader may want to optimally coordinate the luminances of all objects, so that nothing looks inappropriately dark or bright and the contrast are good, e.g. the person standing in the dark in this cave may be coded in the master HDR graded image around 0.05 nit (assuming HDR renderings will not only be able to render bright highlights, but also dark regions).
It can be understood that it may not always be a trivial task to map all the object luminances for all these very different types of HDR scene to optimal luminances available in the much smaller SDR or LDR dynamic range (DR_1) shown on the right of FIG. 1, which is why preferably a human color grader may be involved for determining the color transformation (which comprises at least a luminance transformation, or luma transformation when equivalently performed on the luma codes). However, one can always choose to use automatically determined transformations, e.g. based on analyzing the color properties of the image content such as its luminance histogram, and this may e.g. be a preferred option for simpler kinds of HDR video, or applications where human grading is less preferred e.g. as in real-time content production (in this patent it is assumed that without limitation grading could also involve the quick setting of a few color transformation function parameters, e.g. for the whole production quickly prior to the start of capturing).
Applicant has designed a coding system, which not only can handle the communication (encoding) of merely a single standardized HDR video, for a typical single kind of display in the field (with every end viewer having e.g. a 1000 nit PB display), but which can at the same time communicate and handle the videos which have an optimal look for various possible other display types with various other peak brightnesses in the field, in particular the SDR image for a 100 nit PB SDR display.
Encoding only a set of HDR images, i.e. with the correct look i.e. image object luminances for a rendering on say a 1000 nit HDR monitor, in e.g. a 10 bit legacy MPEG or similar video coding technology is not that difficult. One only needs to establish an optimal OETF (opto-electronic transfer function) for the new type of image with considerably larger dynamic range, namely one which doesn't show banding in the many compared to white relatively dark regions, and then calculate the luma codes for all pixel/object luminances.
Applicant however designed a system which can encode images of a first dynamic range actually as images of a second dynamic range, e.g. communicating HDR images actually as LDR images, i.e. then actually LDR (or SDR, i.e. referred to a 100 nit PB reference display, and often optimally color graded on such a reference display) images are communicated to a receiver, which then can already immediately be used for rendering the correctly looking SDR look on legacy 100 nit PB SDR displays (without wanting to lose generality, in the description below we assume to have such an embodiment, in which HDR images with a content peak brightness of say PB_C=1000 nit are actually communicated as 100 nit PB_C i.e. SDR images, with in addition the necessary color transformation functions to reconstruct the PB_C=1000 nit look images from the received 100 nit SDR images being received as metadata). So one should understand that these SDR images are also an important component of actually HDR correct artistic look images being communicated.
Thereto, a set of appropriate reversible color transformation functions F_ct is defined, as is illustrated with FIG. 2. These functions may be defined by a human color grader, to get a reasonably looking SDR image (Im_LDR) corresponding to the HDR master image MAST_HDR, whilst at the same time ensuring that by using the inverse functions IF_ct the original master HDR (MAST_HDR) image can be reconstructed with sufficient accuracy as a reconstructed HDR image (Im_RHDR), or, automatic analysis algorithms may be used at the content creation side for determining suitable such color transformation functions F_ct. Note that instead of relying on a receiving side to invert the functions F_ct into IF_ct, one can also send already the needed functions for calculating Im_RHDR from the received and decoded intermediate SDR image Im_RLDR. So what the color transformation functions actually do is change the luminances of the pixel in a HDR image (MAST_HDR) into LDR luminances, i.e. the optimal luminance compression as shown in FIG. 1 to fit all luminances in the 100 nit PB LDR dynamic range DR_1. Applicant has invented a method which can keep the chromaticities of the colors constant, effectively changing only their luminances, as will be elucidated below.
A typical coding chain as shown in FIG. 2 works as follows. Some image source 201, which may e.g. be a grading computer giving an optimally graded image, or a camera giving a HDR output image, delivers a master HDR image MAST_HDR, to be color transformed and encoded. A color transformer 202 applies a determined color transformation, e.g. a concave bending function, which for simplicity of elucidation we will assume to be a gamma function with coefficient gam=1/k and k a number larger than 2.0. Of course more complex luminance mapping functions may be employed, provided that they are sufficiently reversible, i.e. the Im_RHDR image has negligible or acceptable banding. By applying these color transformation functions F_ct comprising at least luminance transformation functions, an output image Im_LDR results. This image or set of images is encoded with a legacy LDR image encoder, which may potentially be modified somewhat, e.g. the quantization tables for the DCT-ed transformations of the prediction differences may have been optimized to be better suited for images with HDR characteristics (although the color transformations may typically already make the statistics of the Im_LDR look much more like a typical LDR image than a typical HDR image, which HDR image typically has relatively many pixels with relatively dark luminances, as the upper part of the range may often contain small lamps etc.). E.g., a MPEG-type encoder may be used like HEVC (H265), yielding an encoded SDR image Im_COD. This video encoder 203 then pretends it gets a normal SDR image, although it also gets the functions F_ct which allow the reconstruction of the master HDR image, i.e. effectively making this a dual co-encoding of both an SDR and a HDR look, and their corresponding set of images (Im_RLDR, respectively Im_RHDR). There may be several manners to communicate this metadata comprising all the information of the functions F_ct, e.g. they may be communicated as SEI messages. Then a transmission formatter 204 applies all the necessary transformations to format the data to go over some transmission medium 205 according to some standard, e.g. a satellite or cable or internet transmission, e.g. according to ATSC 3.0, i.e. packetization of the data is performed, channel encoding, etc. At any consumer or professional side, a receiver 206, which may be incorporated in various physical apparatuses like e.g. a settopbox, television or computer, undoes the channel encoding by applying unformatting and channel decoding. Then a video decoder 207 applies e.g. HEVC decoding, to yield a decoded LDR image Im_RLDR (this is the intermediate image which can be used for directly driving a legacy SDR display if available, but which must still be color transformed to obtain from it a HDR or MDR image as required for displays with higher display peak brightness PB_D). Then a color transformer 208 is arranged to transform the SDR image to an image of any non-LDR dynamic range. E.g. the 5000 nit original master image Im_RHDR may be reconstructed by applying the inverse color transformations IF_ct of the color transformations F_ct used at the encoding side to make the Im_LDR from the MAST_HDR. A display tuning unit 209 may be comprised which transforms the SDR image Im_RLDR to a different dynamic range, e.g. Im3000 nit being optimally graded in case display 210 is a 3000 nit PB display, or a 1500 nit or 1000 nit PB image, etc.
FIG. 3 shows how one can design such a chromaticity-preserving luminance re-calculation, taken from WO2014056679, which applicant believes would be the closest prior art for understanding the present invention. One can understand this processing when seen in the gamut normalized to 1.0 maximum relative luminance for both the SDR and the HDR image (i.e. assuming that the SDR and HDR have the same e.g. Rec. 2020 primaries, they have then exactly the same tent-shaped gamut; as shown in FIG. 1 of WO2014056679). If one were to drive any display with e.g. the cowboy having in the driving image a luma code corresponding to a luminance of 10% of peak brightness of the display, then that cowboy would render ever brighter the higher the PB of the display is. That may be undesirable, as we may want to render the cowboy with (approximately) the same luminance on all displays, e.g. 60 nit. Then of course his relative luminance (or the corresponding 10 bit luma code) should be lower the higher the PB of the display is, to get the same ultimate rendered luminance. I.e., one could represent such a desire as a downgrading mapping e.g. from luma code 800 for the SDR image, to e.g. luma code 100 for the HDR image (depending on the exact shape of the EOTF defining the codes which is used), or, in luminances one maps the 60% SDR luminance to e.g. 1140th of that for a 4000 nit HDR display or its corresponding optimally graded image. Downgrading in this text means changing the luma codes of the pixels (or their corresponding to be rendered luminances) from a representation of higher peak brightness (i.e. for rendering on a higher PB display, e.g. of 1000 nit PB) to the lumas of an image of the same scene in a lower PB image for rendering on a lower PB display, e.g. a 100 nit SDR display, and upgrading is the opposite color transformation for converting a lower PB image into a higher PB image, and one should not confuse this with the spatial upscaling and downscaling, which is adding new pixels respectively dropping some pixels or some color components of those pixels. One can do that for any color, in which a (RGB) triplet corresponds to some chromaticity (x,y) in the display or encoding code gamut, in a manner which will automatically scale to the maximum luminance available (renderable) for that chromaticity Lmax(x,y), by the apparatus of FIG. 3.
We see that FIG. 3 consists of two parts, which should be well understood. The upper track consists of processing to determine a multiplicative factor g, which can be determined in various manners. In particular, one could do so in a linear or non-linear representation of RGB (the non-linear R′ being e.g. the square root of the linear R), but we will assume for now that the RGB components are linear. In practice, one could just use some Lookup table to get L*=LUT(L). But it is important to understand that with the geometrical shape of the function which is represented as the LUT, the creation side (e.g. a human color grader, or an automatic image analysis system that proposes e.g. a function composed of three linear parts) determines how exactly particular HDR luminances (of their possible values between 0.0X nit and e.g. 5000 nit which are codeable in the HDR coding) are to correspond with equivalent SDR luminances, i.e. to be calculated if a SDR image is needed for an input HDR image. If one knows that a luma of a pixel color (i.e. the coding of the luminance) is related in a precise functional manner to the pixel color luminance, one can equivalently specify a functional relationship between two luminances (e.g. the SDR luminance of the SDR grading corresponding to the HDR luminance in the M_HDR original master HDR grading), one can also specify the desired luminance transformation as a luma transformation, with a somewhat different function shape. Sometimes there may be an advantage to specify a luma transformation, e.g. if the luma domain is more perceptually uniform the grader may get quicker to his desired look, but for the present description and its teachings we will assume a decoder receives a luminance transformation specification in case no specific codification of the transformation function is required (some apparatuses may easily transform the specification to whatever color space they use internally). So the upper track consists of establishing which kind of luminance changing behavior is needed for calculating a second image with a different dynamic range than the input image, and more precisely, this transformation is summarized as a multiplication value for each pixel, depending at least on that pixel color (and in more advanced versions the multiplication factor could also depend e.g. on the pixel position).
The lower part shows how the apparatus can actually implement the luminance transformation of each pixel color in e.g. the HDR image to its SDR equivalent (or for a decoder we assume in the elucidations that the transformation typically transforms a received SDR image into some HDR image, or some medium dynamic range (MDR) image for serving a display with a particular display peak brightness PB_D which lies between the content peak brightness of the master HDR image M_HDR, and the 100 nit PB_C of the SDR corresponding grading). The luminance (or “brightness”) of a color is given by the length of the vector, so if again we have e.g. linear components RGB, one can scale the vector by multiplying with the appropriate value g, representing the luminance transformation from HDR-to-LDR, or alternatively LDR-to-HDR for that color. But one can technically find that also this lower branch can be equivalently realized on some other color representations, e.g. Y′ CbCr, with Y′ a typical luma as e.g. defined in Rec. 709, and Cb and Cr corresponding chrominances.
Actually, one can demonstrate that this 3-component color transformation corresponds to applying a similar luminance mapping, which on the achromatic axis (i.e. of colors having no particular hue) maps the input luminance L of the color in the SDR image, to the needed relative output luminance L* of the optimal HDR graded image. Without diving into details, what is relevant from this teaching, is that the corresponding color transformation can then be realized as a multiplicative transformation on the (in the prior art preferably linear) RGB components, on each component separately, by a multiplier 311, with three times the same constant g larger or smaller than 1.0, which corresponds to whatever shape of the luminance transformation function L_out=TM(L_in) one chooses (e.g. a human color grader on the creation side, or some artificial intelligent automatic re-grading algorithm), which can also be formulated as a functional transformation of the maximum of the input red, green and blue color values of a pixel. So for each input color (R,G,B), the appropriate g-value is calculated for applying the desired color transformation which transforms Im_RLDR into Im_RHDR (or in an appropriately scaled manner into any other graded image, like Im3000 nit), when luminance mapper 307 gets some SDR-luminance to HDR_luminance mapping function, e.g. a parametrically specified loggamma function or sigmoid, or a multilinear curve received as a LUT. The components of the exemplary embodiment circuit are: 305: maximum calculator, outputting the maximum one (maxRGB) of the R, G, and B values of a pixel color being processed; 301: luminance convertor, calculating the luminance of a color according to some color definition standard with which the system currently works, e.g. Rec. 2020; 302: divider, yielding Lmax(x,y) as L/max(R,G,B); 307 luminance mapper actually working as a mapper on maxRGB, yielding m*=TM(maxRGB), with TM some function which defines the luminance transformation part of F_ct; 308: a multiplier, yielding L*=(m*)×Lmax(x,y) and 310 a gain determination unit, being in this embodiment actually a divider, calculating g=L*/L, i.e. the output HDR relative luminance divided by the input SDR relative luminance L; and 311 is a multiplier arranged to multiply the three color components R, G, B with the same g factor.
This circuit may be appropriate for some color encodings. However, one would ideally like to work in typical SDR encodings as they are typically used. Im_LDR as it would come out of HEVC decoder 207 in any typical receiving-side apparatus, would typically be in a non-linear Y′CbCr encoding (wherein we can assume the non-linearity to be a square root approximately). In particular, if one wants the HEVC decoded Y′CbCr images to be directly usable for legacy SDR displays, they would be Rec. 709 interpretable.
One can also design equivalent color mappings, which even if not exactly the same mathematically, i.e. not mapping the various SDR colors to exactly the same HDR colors under the various alternative HDR video decoder embodiments, at least provide a reasonably similarly looking image, e.g. with at least the same colors for the darkest parts of the image. An example where one could deviate is when clipping or soft-clipping some bright values (the second being possible if the image is not needed for further change of those clipped values, e.g. in case the processing circuit is used to derive a SDR secondary grading when receiving a communicated HDR image) instead of keeping them sufficiently below the upper color gamut boundary of the RGB-encoding, but that would typically be a choice of the creation side, e.g. the color grader being responsible for the final look.
An example of what is possible compared to the max(R,G,B)-circuit of FIG. 3 is elucidated with FIG. 4.
The nice property of using a max(R,G,B)-based luminance mapping (or the MAXRGB being the index which looks up in the luminance mapping function shape which corresponding output luminance Luminance_Im_LDR should be used), is that the color transformation will never run out of gamut. E.g., if we have a blue pixel which is near its maximum brightness (near the top of the gamut of possible RGB colors), then the MAXRGB measurement of this pixel's brightness will be close to 1.0, as shown in FIG. 4A. Suppose we have a typical HDR-to-SDR re-grading luminance mapping function of the convex shape as shown. The multiplication factor to use will then be (if the image creation side specified the curve so that an output luminance Lum_LX corresponds to the MAXRGB input value): g=Lum_LX/MAXRGB, which will be slightly above 1.0. B will be the biggest color component, so the other two will be smaller, and no out-of-range mapping can occur for them. B will be mapped to B*g=MAXRGB*Lum_LX/MAXRGB, i.e. this happens to be the same value numerically on a relative 0.0-1.0 scale as the desired luminance Lum_LX, and, within the range of possible B values, i.e. <=1.0.
If one now however one uses another luminance-characterizing value, namely the luminance itself, one can get for this highly saturated color the following. Since L is much smaller (e.g. 0.5), L being the luminance of e.g. that blue color as can be seen in the 2D gamut section shown in FIG. 4B, one will have a relatively larger value for the function output, namely Lum_LL. G=Lum_LL/L will then be approximately 2.0. If one uses the same multiplicative factor scaling of the RGB components with a strategy which is so luminance-specified, one will get for B approximately 1.0 multiplied by g=2.0 an out of range value which is clipped to 1.0. This will typically reduce the saturation of those colors. That may of course be a desired behavior for those colors determined by the creation side (e.g. a human grader), but in this case it is no longer so that all colors are within gamut. However, the grader can specify a curve to be equivalently used for such a luminance-defined color transformation, or in particular luminance transformation. The behavior may yet be different again if one doesn't use the exact luminance L of the pixel color, but the luma Y′, because this luma doesn't exactly contain the correct luminance information (some luminance information for saturated colors has leaked into the chrominance components).
However, in that philosophy the creation side, and in particular a human color grader may not have sufficient control of the behavior of what he desires, i.e. how the colors should behave in the SDR look corresponding to the master MAST_HDR image (which he may have artistically created previously, or this image may be straight from camera in other embodiments or applications, etc.).
The inventor aimed at producing a good pragmatic encoding or decoding core circuit incorporable in such a practical Y′ CbCr signal path, and also versatile enough given the creator's needs.