Recently new developments have occurred regarding the encoding of images/video (whether of captured scenes or computer graphics), namely, it is desirable to better capture the entire range of object luminances and colors occurring in nature, up to large luminance values like e.g. 25000 nit which can occur in outside sunny environments, or near strong artificial lights, and often also low values like 0.01 nit, which is called HDR (high dynamic range) encoding. There is a push both on the content creation side, e.g. cameras (and even mobile appliance cameras would desire better capturing of actual scenes, especially when being used liberally and simplistically in all kinds of environments such as a mobile phone camera, irrespective of and decoupled from on which rendering system a captured image will later be rendered) or the artificial computer colors spaces of computer games or special effects, as on the rendering side. Since now display of ever higher peak brightness emerge, which by themselves don't define what is required for a HDR rendering chain, but facilitate introducing such. At the moment the typical HDR display is LED backlighted LCD, but if one e.g. relaxes the condition of color saturation, one may also put a monochrome backlight behind an OLED e.g. (the light leaking through creates an RGBW rendering). For several reasons, at least for a number of years into the future, one may desire some form of backwards compatibility, which means that data of a so-called low dynamic range (LDR) encoding must be available or at least easily determinable, so that e.g. an upgraded video processing box can deliver an LDR signal to a lower dynamic range display. Moreover, as will be shown in this text, having available a good LDR representation may prove useful even on long term. The inventor realized that one rationale for having an LDR encoding is that, although displays of ever increasing dynamic range are emerging (high end), there is also a considerable segment of low dynamic range displays (e.g. mobile in an outside environment, projection, etc.). In fact, there may be a need to automatically redertermine for several possible imaging or rendering scenarios the grey values as captured in an image signal, just as one would geometrically scale a picture to show it on displays of different resolutions.
A HDR capturing chain is more than just pointing a camera at a scene with a large luminance contrast ratio between the darkest and the brightest object and linearly recording what there is (capturing ICs such as e.g. a CCD typically being partly (near)-linear). HDR image technology has to do with what exactly the intermediate grey values for all the objects are, since that conveys e.g. the mood of a movie (darkening already some of the objects in the scene may convey a dark mood). And this is a complex psychological process. One can e.g. imagine that psychologically it isn't that important whether a bright light is rendered on a display exactly in a proportion to the rest of the rendered grey values as the scene luminance was to the rest of the scene object luminances. Rather, one will have a faithful impression of a real lamp, if the pixels are rendered with “some” high display output luminance, as long as that is sufficiently higher than the rest of the picture. And there may be a couple of “lamp-light” white levels, but as soon as they are well-apart, their exact code-levels or ultimately display-rendered output luminances may oftentimes be less critical. A grey value allocation between self-luminous and reflecting objects (in the various illumination regions of the scene) is also a critical task depending on the display gamut and typical viewing conditions. Also one may imagine that the encoding of the darker regions is preferably done so that they can be easily used in different rendering scenarios such as different average surround lighting levels (i.e. they may be locally brightened). In general because this is a difficult psychological task, artists will be involved in creating optimal images, which is called color grading. In particular, it is very handy when the artists make a separate LDR grading, even if that is done in a “pure HDR encoding strategy”. In other words in such a scenario when encoding a sole HDR camera RAW signal, we will also generate an LDR image, not necessarily because it is to be used for a large LDR fraction of the video consumption market, but because it conveys important information about the scene. Namely there will always be more important regions and objects in the scene, and by putting these in an LDR substructure (which can conceptually be seen as an artistic counterpart of an automatic exposure algorithm), this makes it more easy to do all kinds of conversions to intermediate range representations (MDR), suitable for driving displays with a particular rendering and viewing characteristics. In particular one may tune this LDR part according to several criteria, e.g. that it renders with good quality on a standard reference LDR display, or conveys a certain percentage of the total captured information, etc.
There are not so many ways to encode a HDR signal. Usually in prior art one just natively codes the HDR signal, i.e. one (linearly) maps the pixels to e.g. 16 bit words, and then the maximum captured luminance value is the HDR white in a similar philosophy to LDR encoding (although psychovisually this usually is not a reflective white in the scene, but rather a bright color of a lamp). One could also map a full range HDR signal to the 8 bit LDR range via some “optimal” luma transformation function, which would typically be a gamma function or similar. This may involve losing color precision with corresponding rendering quality issues, especially if at the receiving side image processing such as local brightening is expectable, however the dominant grey value grading of the image objects is roughly preserved (i.e. their relative/percentual luma relationships).
Prior art has also taught some HDR encoding techniques using two picture data sets for the HDR, typically based on a kind of scalable coding concept, in which by some prediction, the precision of a “LDR” encoded local texture is refined, or stated more accurately, projected to a HDR version of that texture, typically by scaling the LDR luminances (the LDR in those technologies is normally not a good looking LDR grade, but typically a simple processing on the HDR input). And then the difference of the original HDR image with the prediction is co-encoded as an enhancement picture to the degree desired. E.g., one may represent a HDR gray value of 1168 with a division by 8 to a value 146. This HDR value could be recreated by multiplying by 8 again, but since a value 1169 would quantize to the same base layer value 146, one would need an enhancement value equal to 1 to be able to recreate a high quality HDR signal. An example of such a technology is described in patent EP2009921 [Liu Shan et al. Mitsubishi Electric: Method for inverse tone mapping (by scaling and offset)]. In theory for these codecs, the inverse tone mapping prediction model (which is the smarter equivalent of a standard multiplier) should be sufficiently accurate to already give a reasonably precise HDR look, onto which minor corrections are applied (indeed, if one projects a range of possible values to another range by using a non-linear function, apart from precision issues, the original range values should be recoverable).
Another two-picture encoding is described in the currently not yet published application U.S. 61/557,461 of which all teachings are hereby incorporated by reference.
That system also works with an LDR and HDR image, and has some similar recognitions which are useful for the present invention too, namely, e.g. the recognition that in an HDR signal one may always find an LDR subregion of major importance, and, it may be interesting to make that LDR an actually usable signal for LDR rendering (e.g. a dedicated LDR grade). And, the HDR information is typically not only non-linearly separate on the luminance axis (i.e. e.g. a lamp having much higher luminance than the white in the scene), but it also has a different meaning Oftentimes one may e.g. speak of HDR effects, i.e. they not necessarily need to precisely code the object textures like the main content of the scene, i.e. its LDR part, but rather depending on which HDR region/effect it is, one may encode it with different criteria like reduced precision, or leave it away altogether. That has as a result that oftentimes a lot of bit budget can be saved for the HDR parts of the scene. Furthermore, encoding in such a LDR+HDR effects part two-picture format has the advantage that both can be very easily separated. Legacy or lower capability systems needing only the LDR can directly extract it ignoring the rest. But also having the HDR as a separately coded picture makes it very easy to apply them in a tuned way depending on the actual gamut capabilities of an actual rendering display, e.g. by adding a scaled HDR effect onto the luminance transformed LDR part.
However, whereas that format works perfectly with systems which are already configured for dual picture encoding, e.g. by re-using the structure normally available for a 3D coding, we would desire similar capabilities in case we have only a single picture coding place holder available. With e.g. the growing field of video on demand, one may imagine that at least some of those systems would prefer to have everything encoded in a single picture signal.
Yet it is an object of at least some of the present embodiments to still have the benefits of encoding such a optimal LDR-within-HDR framework in a single picture, despite the fact that it seems strange to code two pictures into one. Note that the other classes of methods described above, although enforcing some of the data in an LDR picture format mathematically/technically (as a placeholder), do not have real LDR images (co)encoded, i.e. images that would look good on an LDR viewing system, because they have been carefully graded (at least selected, oftentimes further color grading transformed) for their LDR look (rather one may have an “LDR” picture with the right object geometry, but if directly rendered showing severely modified object texture grey values, e.g. the wrong contrast or average brightness).