Recently a number of HDR encoding technologies have been proposed, like e.g. the dual layer method of Dolby (WO2005/1040035). However, the industry is currently still looking for a pragmatic HDR video (/image) encoding technology with fits with (a balance of) all requirements, such as the very important factors like amount of data but also computational complexity (price of ICs), ease of introduction, versatility for the artists to create whatever they like, etc. In particular, a dual layer approach is seen as needlessly complex. One would ideally like to be able to design a coding technology which fits with legacy encoding, such as e.g. DCT-based MPEG HEVC encoding. A problem is that this is somewhat counter-intuitive: how can one encode a HDR image, which should by definition be something different from an LDR image, typically having a larger amount of interesting brightness/luminance ranges, in a technology optimized for containing LDR images? These legacy LDR image handling/coding systems were designed and optimized to work with typical LDR imaging scenarios, which are normally well-lit with e.g. a 4:1 in studio illumination ratio (or e.g. 10:1), giving for most of the objects (which can vary in reflectance between say 85% for white and 5% for black) in the view a total contrast ratio of about 68:1 (resp. 170:1). If one looks at relative rendering (i.e. mapping the image white to the available display white) of the object luminances starting from a peak white, a typical early LCD monitor without local dimming would have had something like 100 nit white and 1 nit black which would match with the image contrast ratio, and typically one thought that on average CRT systems which might have been watched also during the day would have something like a 40:1 capability. Having a standard legacy luminance code allocation gamma function of 2.2 in these systems seemed satisfactorily for most scenarios of even higher scene contrast. Although some in those days regarded as acceptable errors were made, such errors of rendering of badly encoded high luminance scene regions (e.g. hard clipping of bright exteriors behind a window) were also acceptable because LDR displays couldn't render those object luminances physically accurate anyway.
However there are scenarios for which there is now a desire to improve the rendering, like e.g. an indoors scene in which one can simultaneously see the sunny outdoors, in which case there may be an illumination ratio of 100:1 or even more. With linear relative rendering (i.e. focusing on the brightest encoded regions firstmost, or equivalently the middle grey of the brightest scene regions, and mapping image white to display white), the indoors white would map to psychovisual black to the viewer! So in LDR those sunny regions will typically show up as (soft)clipped (typically already in the encoded image having difficult to discriminate codes around the maximum 255 for those pixels). However, on a HDR display we would like to show them both bright and colorful. That would give a much more naturalistic and spectacular rendering of such scenes (as if you're really on holiday in Italy), but even scenes in which the higher brightness content is only composed of some specular reflections already show a major visual quality improvement. If not already artefacts like clipping or quantization errors look annoying on e.g. a 5000 or 10000 nit display, at least we want to be able to drive such HDR displays with the right kind of image, so that the rendered images will be as beautiful as the HDR display allows.
Classical thinking was however that to encode additional over-brightness ranges, one would need to have (much) more bits, which are the higher bits which encode the object luminances above an LDR range. That could happen either by natively encoding in single larger code words (such as OpenEXR with 16 bits of which a sign bit, 5 bits exponent, and 10 bits mantissa, or Ward's LogLuv encoding, which mathematically rigourously tries to capture the entire world of possible object luminances with high precision), or by using a first layer with standard LDR range codes (e.g. a classical JPEG approximation of the HDR image), and a second layer to improve such pixel luminances to higher brightness (e.g. a boost image to boost each pixel if needed to a higher luminance, i.e. a multiplication of two such 8 bit images being equivalent to a single linear 16 bit code).
A major practical problem to be solved when designing a practical HDR coding technology, in addition to the fact that of course it must be able to handle a huge range of different HDR images, is that hardware manufacturers desire lower amounts of bits per code word (channel, i.e. the luma, and two chromatic channels) however, and although our below proposed technology can also work with larger bit words, we come with a solution that works nicely under a limitation of 10 bits for at least a luminance (or more precisely a luma) channel (note that although we elucidate the embodiments with a luminances channel, our concepts may mutatis mutandis be embodied as working on (linear or non-linear) RGB color representations, etc.). Furthermore, we developed a framework which can do in a dual philosophy both the color pixels encoding (of the HDR look via an LDR image) and the color appearance conversion for several rendering scenarios (i.e. the needed optimal looks for rendering a scene on several displays with different peak brightness, e.g PB=800 nit) in a functional manner, which means only functions need to be co-encoded when encoding the look of at least one further grading, and specifically an HDR look in addition to an LDR look, instead of for each picture at least a second picture.
We have currently two categories of HDR encoding systems, since the market would like such versatility in an encoding system, given the various players and their different needs. In the mode-i (or HDR-look encoded as a sole defining image, e.g. on a BD disk, or an stream of AVC or HEVC images over a network connection) system we use a HDR-look image as the sole pixel image, which is used to encode the object color textures and shapes (see in WO2015007505 of applicant how such a sole HDR image can be sent to a receiver to define the pixel colors of at least the HDR look, and how with appropriate re-grading functions the receiver can calculate by processing the colors in that image other look images). By this we mean that we take the original HDR master grading image, i.e. an image optimally color graded to look best on a reference HDR display like e.g. typically a 5000 nit peak brightness display, and only minimally transform this: basically only apply a code allocation function or Opto-electronic transfer function OETF (note that although this OETF defines how scene luminances as captured e.g. by a camera are transferred to luma codes, television engineers instead like to specify the inverse function being the electro-optical transfer function EOTF to go from luma codes to reference display rendered luminances) by using the OETF optimally allocates the available e.g. 10 bit of codes for the luma Y′ channel over all brightness values one needs to be able to make on a reference [0-5000] nit display. Other desired gradings for displays of different peak brightness can then be made by transforming this HDR-look image. In our framework we allow for this display look tunability by typically making only one second grading which is on an other extreme end of the range of possible displays to be served, namely a look which is optimal or reasonable according to the content creator/color grader for a 100 nit peak brightness display (which is typically the reference display for the category of LDR displays). Note that this is a co-encoding of a further look rather than a mere creation-side recoloring step. This required color transformation is determined by applying mapping functions such as gamma functions realizing a global brightness readjustment (e.g. brightening the darker colors in the image), arbitrary S-shaped or inverse S-shaped curves to adjust local contrast, color saturation processing functions to adjust e.g. the saturation to the corresponding brightness of some objects or regions in the image etc. We can liberally co-encode those functions (whichever functions we need as long as they belong to a limited set of basis functions which the receiver can in a standardized manner understand) as metadata associated with the pixellized HDR-look image, in which case we parametrically DEFINE the second LDR-look grading image from the HDR-look image (i.e. we need not encode that LDR-look image as a pixel image anymore). Note carefully the difference with two layer encoding systems: in our system the color transformation functions are all there is encoded about the second look to be able to re-grade the second look at the receiver, so rather than the rough approximate functions of 2-image technologies, our functions contain the full smart knowledge of how the illuminations of the various objects should behave in various rendering look according to the content creator! Given this knowledge of how the creating artists wants the look to transform from the first look for displays with a first level of color rendering capabilities to a second look for displays with a second level of color rendering capabilities (in particular the display peak brightness), a display with intermediate capabilities (e.g. 1200 nit peak brightness) can then automatically come to a more optimal driving image for its rendering situation by using the knowledge in the two gradings and interpolating (e.g. the display may do an asymmetric mixing of the two pixellized images of the HDR-look and the derived LDR-look image from the HDR-look image and the functional transformations, in which the multiplicative mixing percentages are determined by how close to the HDR or LDR display the actual display is on a psychovisual non-linear scale), which will be better than driving the display with either the original HDR-look image or the LDR-look image.
This is a powerful yet simple definition of not solely a single (HDR) image look on a scene (e.g. a 5000 nit rendering), but a full framework for deriving reasonable renderings of the scene for various possible displays in the field like at a consumer's home (and even potentially adaptation to viewing environment e.g. by applying a post-gamma modeling the changed contrast sensitivity of human vision under various surround illuminances). It is mainly useful e.g. for applications/scenarios in which a creator has made a nice HDR version of their content, and wants to have firstmost this HDR look in the actual encoding sent to receivers (e.g. on a HDR BD disk, or by ordering a HDR movie online over the internet, or a HDR television broadcast, etc.). It is not necessary that a customer who purchases this content version actually has a HDR display, since he can purchase it for later when he does have a HDR display and can now use the HDR-2-LDR conversion, but it would be the preferred option when the customer wants content for his HDR display.
Whereas the above HDR-look manner of encoding HDR scenes (as explained mode i being that at least HDR look images encoded as a pixel image, but in fact also further looks on that same scene are encoded but then parametrically with color transformation functions, such as e.g. a clipping embodiment, in which the LDR-look isolates a subrange of the HDR image and clips the rest) already poses significant technical challenges for coming to a pragmatic new technical system for future image but mostly also video encoding (taking into account such factors as simplicity of IC design for the hardware manufacturers, yet allowing content makers to create whatever beautiful HDR content like scifi movies, spectacular television shows, or nature documentaries, etc. they want to make, with many creative HDR effects such as lamps which seem really lit), the market desired yet another layer of complexity, which we will teach in this patent description.
Namely, for some (which we will call mode-ii) applications one may want to have an LDR-look image as the sole pixellized image encoding the scene objects, which is e.g. written as sole image on a blu-ray disk. Although the content creator also cares much about the quality of the HDR look, he very much focuses on the LDR look being similar as it would be with legacy technologies. There will then typically be function parameters co-encoded in associatable metadata to derive a HDR look image by upgrading the LDR-look image which was communicated in the image signal S_im. There may be various reasons for choosing this mode-ii variant (or LDR-look), which may e.g. be for legacy systems which are unable to do any processing (e.g. if one prefers to encode the sole image in a particular embodiment which encodes the colors as Y′uv colors rather than a YCrCb encoding, one could still encode this in a legacy HEVC framework by pretending the Y′uv image is a strangely colored YCrCb image and further using legacy DCT-based encoding schemes, like standardized in one of the members of the MPEG codec family), but also for applications which need a LDR look (e.g. viewing a movie on a low brightness portable display) and may not want to do too much processing. Or perhaps the creator doesn't want to invest too much time in creating a perfect HDR look (but e.g. only a quickly makes one by doing minor finetuning of an LDR-2-HDR autoconversion which e.g. isolates bright regions and non-linearly boosts them, e.g. for an old Laurel and Hardy movie remastering), and considers his LDR-look the most important master grading of the LDR and HDR looks, which should be directly encoded without needing any color transformation, with potential color errors. E.g. a television broadcaster may choose this option, especially for real-life broadcasts (e.g. the news may not need to be in the most spectacular HDR).
This LDR-look (mode ii) encoding however has additional complexity due to the mathematical nature of the problem and coding mathematics on the one hand versus liberal artistic grading desires on the other, which makes it a daunting task to come up with a good technical framework. To be more precise, on the one hand we need functions which first grade down from a desired master HDR image, and at the receiver with these received functions (or the inverse functions of the downgrading actually) the receiver can upgrade to at least a close approximation of the original HDR image again, i.e. in the metadata function parameter data there will be parameters for functions (derived by the encoder from the functions which the grader used in the downgrading from the master HDR) which can map the sole LDR image to a sufficiently close HDR prediction Rec_HDR. But on the other hand, the LDR image should when directly rendered on a +−100 nit display, i.e. without further color transformation, look sufficiently good according to the color grader too. So there will be a balance between selection of the functions, and how they will influence the LDR and Rec_HDR looks, and that also taking into account other issues, like that IC or apparatus manufacturers would like to see a limited set of standard functions which are useful for the re-grading of looks, and content creators like those functions to quickly specify whatever looks they desire, since grading time is expensive and the timing of movie releases may be critical. In the below description we will describe a practical system for handling this mode ii variant of HDR scene encoding.