Recently a number of very different displays have appeared on the market, in particular television signal receiving displays (televisions) with very different peak brightness. Whereas in the past the peak brightness (PB) of so-called legacy low dynamic range (LDR) displays differed by at most something like a factor 2 (somewhere between 80 and 150 nits), the recent trend to ever higher peak brightness has resulted in so-called high dynamic range (HDR) televisions of 1000 nits and above, and displays of 5000 nit PB, and it is assumed that soon various displays of such higher PBs will be on the market. Even in movie theaters one is recently looking at ways to increase the ultimate brightness dynamic range perceived by the viewer. Compared to a 100 nit LDR standard legacy TV, a e.g. 2000 nit display has a factor 20 more peak brightness, which amounts to more than 4 additional stops available, i.e. more ways to render brighter objects in various images. On the one hand, provided one uses also a new generation HDR image generation or capturing system, this allows for much better rendering of HDR scenes or effects. E.g., instead of (soft) clipping the sunny world outside a building or vehicle (as would happen in a legacy LDR grading), one can use the additional available brightnesses on the luminance axis of the HDR TV gamut to display bright and colorful outside areas. This means that the content creator, which we will call non-limiting the color grader (but he may be embodied in various manners, e.g. in a live television production somebody perhaps only adjusting a single dial affecting some color properties, in particular of the encoding, at some times), has room to make very beautiful dedicated HDR image or video content (typically brighter, maybe more contrasty, and more colorful). On the other hand however, this creates a problem: LDR image coding was designed relatively starting from white, and well-illuminated according to a middle gray of 18% reflection, which means that typically display-rendered luminances below 5% of a relatively low PB of say 100 nit will typically be seen by the viewer as difficult to discriminate dark greys, or even depending on surround illumination undiscriminable blacks. On a 5000 nit display there will be no problem with this optimally graded HDR image: 5% of 5000 nit is still 250 nit, so this will look like a normal interior e.g., and the highest 95% of the luminance range could be used purely for HDR effects, like e.g. lamps, or regions close to such lamps i.e. brightly lit. But on an LDR the rendering of this HDR grading will go totally wrong (as it was also not created for such a display), and the viewer may e.g. only see hot spots corresponding to the brightest regions on a near-black region.
In general, re-gradings are needed for creating optimal images for displays which are sufficiently different (at least a factor 2 difference in PB). That would happen both when re-grading an image for a lower dynamic range display to make it suitable for rendering on a higher dynamic range display (which would be upgrading, e.g. a 1000 nit reference display input image(s), i.e. which would look optimal on a 1000 nit PB actual display, which is then color processed for rendering on an actual display of 5000 nit PB), as the other way around, i.e. downgrading an image so that it would be suitable for display on an actual display of lower PB than the reference display associated with the grading which is coded as video images (and which images are typically transmitted in some manner to a receiving side). For conciseness we will only describe the scenario where an HDR image or images is to be downgraded to LDR.
HDR technology (by which we mean a technology which should be able to handle at least some HDR images, which may be of considerable complexity i.e. high peak brightness, e.g. 10000 nit, but it may work with LDR images, or medium dynamic range images, etc. as well) will percolate in various areas of both consumer and professional use (e.g. cameras, data handling devices like blu-ray players, televisions, computer software, projection systems, security or video conferencing systems, etc.) will need technology capable of handling the various aspects in different ways.
In Wo2013/144809 (and WO2014/056679) applicant formulated generically a technique to perform color processing for yielding an image (Im_res) which is suitable for another display dynamic range (typically the PB suffices to characterize the different display dynamic ranges and hence optimally graded images, since for several scenarios one may neglect the black point and assume it is pragmatically 0) than the reference display dynamic range associated with the input image (Im-in), i.e. which basically formulates the PB of a display for which the image was created as looking optimally, which forms good prior art for the below elucidated invention to improve thereupon. We reformulate the principles concisely again in FIG. 1. However, the reader should understand that some of the properties of the prior art example are relevant in the context of the present embodiments, and some are not present in a general HDR encoding, and no limitations of our present embodiments and teachings, as the can work with such various HDR video (or image) codec technologies.
In particular what is relevant is that one has two different dynamic range looks on a scene, which can be related to each other via a color transformation (e.g. as FIG. 4 elucidates, one can chose to considerably lower the luminance, or equivalently luma (which are the codes encoding the corresponding lumas in a e.g. typically 10 or 12 bit representation), of a street light, and squeeze all such high luminance image objects in a small sub-range of the LDR range of luminances). Although our embodiments can also work in systems which transmit some codification of the master HDR image to any receiving side, we will assume in the below elucidations we use the embodiment of communicating the LDR grading instead of the HDR images, but, together with metadata which encode the color transformation functions (some of which can work in a chromaticity plane, but we focus on luminance transformations primarily) allowing a receiver to recalculate a close reconstruction of the master HDR graded image (Im_in_HDR) of the HDR scene. This allows receiver with HDR capability to render HDR images on a connected HDR display, but also the rendering of legacy LDR images for people who still have an LDR tv or computer monitor, projector, portable display, etc.
This principle is applicable (buildable) generically, i.e. what should not be assumed are any particular limitations regarding the color format of the input image, nor the output image, nor the color space in which the color processing is happening, in particular where the prior art mentions some specific linear RGB processing, for this text we explicitly state that we invented and describe some non-linear color space processings, and the coding strategies based thereupon.
The various pixels of an input image Im_in are consecutively color processed by a color transformer 100 (which we assume here resides in a video encoder, getting HDR video to be encoded as input, and outputting LDR images, which however still optimally contain the HDR information also, though be it in a re-graded LDR look), by multiplying their linear RGB values by a multiplication factor (a) by a multiplier 104, to get output colors RsGsBs of pixels in an output image Im_res. The multiplication factor is established from some tone mapping specification, which may typically be created by a human color grader, but could also come from an auto-conversion algorithm which analyzes the characteristics of the image(s) (e.g. the histogram, or the color properties of special objects like faces, etc.). The mapping function may coarsely be e.g. gamma-like, so that the darker colors are boosted (which is needed to make them brighter and more contrasty for rendering on the LDR display), at the cost of a contrast reductions for the bright areas, which will become pastelized on LDR displays. The grader may further have identified some special object like a face, for which luminances he has created an increased contrast part in the curve. Specifically this curve is applied to the maximum of the R,G, and B color component of each pixel, named M (determined by maximum evaluation unit 101), by curve application unit 102 (which may cheaply be e.g. a LUT, which may be calculated e.g. per shot of images at a receiving side which does the color processing, after typically having received parameters encoding the functional shape of the mapping, e.g. a gamma factor), but the same principles can also work if M is a luminance, or some non-linear representation of a luminance or brightness, like e.g. a luma, or a power 1/N of a luminance, with N some e.g. integer number, etc. Then a multiplication factor calculation unit 103 calculates a suitable multiplication factor (a) for each currently processed pixel. This may e.g. be the output of the tone mapping function F applied to M, i.e. F(M), divided by M, if the image is to be rendered on a first target display, say e.g. a 100 nit LDR display. If an image is needed for e.g. an intermediate display, e.g. 800 nit PB (or another value, maybe higher than the reference display PB of the HDR input image Im_in), then a further function G may be applied to F(M)/M rescaling the amount of multiplicative mapping of the input color to the value appropriate for the display dynamic range for which the image is suited (whether it is directly rendered on the display, or communicated, or stored in some memory for later use). This is a manner to represent some brightness transformation, which may be quite complex, as a multiplication. Although the prior art we mentioned for elucidating the background knowledge for this invention may typically multiply linear RGB components, we emphasize that the present invention embodiments may also work on non-linear e.g. typically RGB color representations, e.g. Rec. 709 OETF transformed R′G′B′ components, or powers of R,G, and B with typically a power value smaller than 1, e.g. ½.
The part we described so far constitutes a global color processing. This means that the processing can be done based solely on the particular values of the colors (and we will only focus on the luminances of those colors) of a consecutive set of pixels. So, if one just gets pixels from e.g. a set of pixels within a circular sub-selection of an image, the color processing can be done according to the above formulated principle. However, since human vision is very relative, also spatially relative, whereby the colors and brightnesses of objects are judged in relation to colorimetric properties of other objects in the image (and also in view of various technical limitations), more advanced HDR coding systems have an option to do local processing. In some image(s) one would like to isolate one or more object(s), like a lamp or a face, and do a dedicated processing on that object. However, again emphasizing the point, in the here presented technology, this forms part of an encoding of at least one further grading derivable from an image of pixels of a master grading (here LDR derived from HDR), not merely some isolated color processing. Since simpler variants in the market will not use local processing (although it is conceptually similar, but leads to i.a. more complex integrated circuits), and the below principles can be explained without those specifics, we will not further details that aspect.
Either the master grading or the derived grading may be actually communicated to a receiving side, as the images encoding the spatial structure i.e. the objects of the imaged scene, and if the color transformation functions encoding the relationship between the two looks are also communicated in metadata, then other gradings can then be re-calculated at a receiving side. I.e., the color processing is e.g. needed to construct by decoding an LDR image if needed, in case HDR images have been received, or vice versa a reconstruction of HDR images in case of the pair of looks the LDR images have been communicated, or stored. The fact that the local processing principle is used in an encoding technology has technical implications, inter alia that one needs a simple set of basic mathematical processing methods, since all decoding ICs or software out in the field needs to implement this, and at an affordable price, to be able to understand the encoding and create the decoder LDR image(s).
When designing pragmatically useful coding technologies for the various image or video using markets, a technical limitation is that from an IC point of view (since also cheap apparatuses may need simple ICs or area parts of an IC, or software), the coding function tools should be few, and smartly chosen, to do what is most needed for the creation and encoding of various dynamic range look images on a scene (so that any “grader” or content creator in any content creation variant gets the desired result of creating an (sufficiently close to his desires) HDR/LDR image look pair and the corresponding encoding for storage or communication thereof). On the other hand, another problem with that is that with the above explained philosophy, where e.g. a human color grader specifies the re-grading, as encoded by e.g. a LDR image and functions to re-grade to a suitable HDR image at any receiving side receiver, in a set of optimal parameters for the specific look of a given scene, the grader must also have the right grading/coding tools and in the right order so that he can conveniently work with them (not only does he need to obtain the good precision of the desired color look, but he needs to do that with as few operations as possible, to quickly and efficiently get the look he wants since time is also of the essence). This dual opponent set of constraints need to be provided for in an elegant manner. Furthermore in case LDR images are transmitted to any receiver there is even a third criterion one must look at, and technological solutions like the below must at least roughly satisfy, namely that when having designed some LDR look image(s), the reconstruction of the HDR images by a receiver HDR decoder must still be of sufficient precision, so that also has an impact of the resultant optimal technical apparatus units for generic HDR encoders and decoders as they are invented.
Hattori et al: “HLS: SEI message for Knee Function Information”, 16. JCT-VC MEETING; Sep. 1, 2014, San Jose, describes a new SEI message to specify a relationship between input HDR luminances, on an input dynamic range, up to e.g. 1200% of a scene white level (i.e. codes up to 1200 nit), and LDR lumas, based on one or more knee points. The knee point was a trick to solve the problem that digital sensors, when illuminated according to an average grey world assumption, had a problematic tendency to hard clip scene objects that where only a little brighter than scene white (which would be about 5× brighter than scene average gray). The idea would be that if one had a better sensor, with less noise for the darker scene luminances, then one could under-expose the scene a little, allowing a discrimination of various brighter than scene white (e.g. a white dress of a bride under the optimal scene illumination) scene luminances, e.g. up to 4× scene white (rather than bluntly clipping to code white, luma Y′=255 in 8 bit, everything above e.g. 1.2× scene white). Of course capturing such brighter scene luminances accurately in the camera sensor was only part of the solution, as one also still needed a trick to allocate actual 8 bit luma code to the analog sensor-determined (relative to the maximum still recordable scene luminance, or 1.0) scene luminances, when calculating an SDR image for consumption, e.g. rendering with a good image quality on a SDR 100 nit PB display. It would not be an elegant solution to just compress all colors on the SDR output luma axis to be able to fit the 4×, or even 12× upper range, because then the darker objects, which should also be well exposed to be well visible, might be too dark for good SDR image quality. So one came up with a technique which kept the classical (Rec. 709) luma allocation of the darker lumas, up to a knee point, and above that knee point one used a more compressed, typically logarithmic luma code allocation strategy, so that a far greater upper range of input luminances (e.g. the range of 1× scene white to 4× scene white) could be mapped to an upper range of the luma codes, e.g. the upper 10%, depending on the position of the knee point (or in case one wants to squeeze a significant amount of brighter than scene white luminances in the SDR image, one could choose a knee point at 50% of the luma range, i.e. 128 in 8 bit, or 512 in 10 bit, but then the color look of the image, though still watchable, may start to deteriorate significantly). Hattori introduces a technique, and a practical manner to quickly convey all needed information to decoders, which need that information to apply the inverse function to do reconstruction of the HDR image when receiving the SDR image, based on one or more such knee points. A kneeing mechanism is not a good manner to accurately control the look of a SDR image. It is an easy manner though to bend a higher dynamic range (input_d_range) with a simple quick function continuously bending higher brightness subranges into smaller subranges of the SDR luma (assuming that this will not be problematic, which is not necessarily true if one has important image content in e.g. the brightest regions, like e.g. clouds which may have beautiful bright grey values, which may get destroyed by a wrong simple logarithmic part of a knee function), especially when the Kx factor specifying up to how many times above scene white luminances should still be codable, is not too high (i.e. medium high dynamic range scenes). It is clear that this document doesn't teach a simple highly usable coarse grading function, which is especially usable when a human grader wants to precisely optimize the look of the image (in contrast with Hattori, which is just the mathematical specification of some reasonably working luminance-to-luma mapping, which can blindly be used by any automatic apparatus, because it's sole purpose is to code merely a HDR look image, i.e. reconstructable at a receiving side, and not necessarily and artistically best looking SDR image, applicant wanted to design a system which, although in some embodiments also working (semi)automatically, should with the same coding principles also cater for markets that have artistically precise desiderata, like accurate color grading by a human color grader on a Hollywood movie). More specifically, even when also the control of a precise darks and brights sub-region of the HDR scene image is not taught, there clearly is not the teaching of the parabolic middle segment, nor does Hattori inspire to doing the HDR research that one needs to come to such a realization.
US 2015/010059 also contains this same knee-point curve (model 3: number of pivot points) communicated as a SEI image teaching, and also contains a teaching of an S-curve, which is merely another possible HDR-to-SDR mapping curve, unrelated to our present application teachings.
Zicong Mai et al.: “Optimizing a Tone Curve for Backward-Compatible High Dynamic Range Image and Video Compression”,
IEEE Transactions on image processing, vol. 20, no. 6, June 2011, is also a manner to communicate reconstructable HDR images actually as SDR images, but in a very different manner, namely by calculating an image-optimal mapping function shape, which is determined based on the luminance histogram of the input image (to not allocate too few codes to big regions, which could introduce banding, see FIG. 3).
WO2014/178286 is also again a knee-type encoder (FIG. 3), allowing inclusion in the SDR code of somewhat more brighter than scene white scene luminances (Nx). This can then be used to render HDR images (which nicely bright brightest objects) on HDR displays which have a Nx brighter peak brightness than SDR displays, e.g. when N is 8 or 10 (FIG. 7).
WO 2014/128586 also contains various technical teachings to communicate HDR images of a HDR scene actually as SDR images, usable for direct rendering on legacy SDR displays already deployed in great numbers at viewer's premises. It teaches that sometimes an image-specific highly customized luminance mapping curve shape may be useful (FIG. 8), but teaches nothing like that the present coarse function may be a particularly useful function in a practical HDR co-communicated with a corresponding graded SDR technology.
None of the prior art inspires even in the direction of the elegant simple HDR encoding system of the present application, which allows even critical color graders to efficiently come to a good quality SDR image, for all practical purposes.